NX-AI / xlstm-jax
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.co/NX-AI/xLSTM-7b.
β89Updated 2 months ago
Alternatives and similar repositories for xlstm-jax:
Users that are interested in xlstm-jax are comparing it to the libraries listed below
- A State-Space Model with Rational Transfer Function Representation.β78Updated 10 months ago
- 𧱠Modula software packageβ173Updated 2 weeks ago
- The AdEMAMix Optimizer: Better, Faster, Older.β179Updated 6 months ago
- β169Updated 3 months ago
- β149Updated 7 months ago
- β214Updated 8 months ago
- Efficient optimizersβ184Updated 2 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β100Updated 4 months ago
- supporting pytorch FSDP for optimizersβ79Updated 3 months ago
- A MAD laboratory to improve AI architecture designs π§ͺβ108Updated 3 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorchβ83Updated last month
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsβ221Updated 3 weeks ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).β114Updated 5 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)β72Updated last week
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"β98Updated 3 months ago
- Code repository for Trajectory Flow Matchingβ59Updated 4 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Modeβ¦β100Updated 6 months ago
- Accelerated First Order Parallel Associative Scanβ177Updated 7 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAIβ276Updated this week
- Griffin MQA + Hawk Linear RNN Hybridβ85Updated 10 months ago
- Annotated version of the Mamba paperβ475Updated last year
- β286Updated 2 months ago
- Official Implementation of "ADOPT: Modified Adam Can Converge with Any Ξ²2 with the Optimal Rate"β419Updated 3 months ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.β42Updated this week
- β79Updated 11 months ago
- Understand and test language model architectures on synthetic tasks.β185Updated 2 weeks ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Ruleβ142Updated this week
- When it comes to optimizers, it's always better to be safe than sorryβ214Updated last month
- β58Updated 4 months ago