NX-AI / xlstm-jaxLinks
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.co/NX-AI/xLSTM-7b.
โ102Updated 8 months ago
Alternatives and similar repositories for xlstm-jax
Users that are interested in xlstm-jax are comparing it to the libraries listed below
Sorting:
- ๐งฑ Modula software packageโ237Updated last month
- โ210Updated 9 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ160Updated 2 months ago
- โ279Updated last year
- Efficient optimizersโ261Updated last month
- The AdEMAMix Optimizer: Better, Faster, Older.โ186Updated last year
- Accelerated First Order Parallel Associative Scanโ188Updated last year
- Evaluating the Mamba architecture on the Othello gameโ48Updated last year
- Cost aware hyperparameter tuning algorithmโ169Updated last year
- supporting pytorch FSDP for optimizersโ84Updated 9 months ago
- โ301Updated 8 months ago
- Getting crystal-like representations with harmonic lossโ194Updated 5 months ago
- A MAD laboratory to improve AI architecture designs ๐งชโ129Updated 9 months ago
- โ229Updated last month
- Latent Program Network (from the "Searching Latent Program Spaces" paper)โ96Updated 6 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsโ293Updated 2 months ago
- Normalized Transformer (nGPT)โ188Updated 9 months ago
- โ187Updated last month
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"โ563Updated last year
- Understand and test language model architectures on synthetic tasks.โ225Updated 2 months ago
- A State-Space Model with Rational Transfer Function Representation.โ81Updated last year
- An implementation of PSGD Kron second-order optimizer for PyTorchโ96Updated last month
- Annotated version of the Mamba paperโ489Updated last year
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesโ146Updated 3 months ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).โ122Updated 11 months ago
- DeMo: Decoupled Momentum Optimizationโ190Updated 9 months ago
- โ57Updated 11 months ago
- โ65Updated 10 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"โ101Updated 8 months ago
- Modular, scalable library to train ML modelsโ164Updated this week