NX-AI / xlstm-jax
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.co/NX-AI/xLSTM-7b.
☆75Updated last week
Alternatives and similar repositories for xlstm-jax:
Users that are interested in xlstm-jax are comparing it to the libraries listed below
- ☆146Updated last month
- The AdEMAMix Optimizer: Better, Faster, Older.☆178Updated 4 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆210Updated 2 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆90Updated 2 months ago
- supporting pytorch FSDP for optimizers☆75Updated last month
- Some preliminary explorations of Mamba's context scaling.☆206Updated 11 months ago
- A State-Space Model with Rational Transfer Function Representation.☆76Updated 8 months ago
- 🧱 Modula software package☆132Updated this week
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆105Updated 3 months ago
- Normalized Transformer (nGPT)☆145Updated 2 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆102Updated last month
- Understand and test language model architectures on synthetic tasks.☆175Updated this week
- Accelerated First Order Parallel Associative Scan☆169Updated 5 months ago
- ☆276Updated last week
- ☆296Updated 6 months ago
- Reading list for research topics in state-space models☆254Updated 3 weeks ago
- ☆149Updated 5 months ago
- WIP☆92Updated 5 months ago
- Evaluating the Mamba architecture on the Othello game☆44Updated 8 months ago
- DeMo: Decoupled Momentum Optimization☆171Updated last month
- Simplified Masked Diffusion Language Model☆251Updated last month
- ☆201Updated 6 months ago
- ☆241Updated 4 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆278Updated last month
- Implementation of the proposed minGRU in Pytorch☆272Updated last month
- PyTorch implementation of models from the Zamba2 series.☆168Updated last month
- 94% on CIFAR-10 in 2.6 seconds 💨 96% in 27 seconds☆195Updated last month
- Efficient optimizers☆145Updated this week
- ☆50Updated 3 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆270Updated 2 months ago