NX-AI / xlstm-jax
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.co/NX-AI/xLSTM-7b.
☆82Updated last month
Alternatives and similar repositories for xlstm-jax:
Users that are interested in xlstm-jax are comparing it to the libraries listed below
- ☆158Updated 2 months ago
- supporting pytorch FSDP for optimizers☆76Updated 2 months ago
- Muon optimizer: +~30% sample efficiency with <3% wallclock overhead☆253Updated last week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆95Updated 3 months ago
- 🧱 Modula software package☆145Updated this week
- A MAD laboratory to improve AI architecture designs 🧪☆102Updated 2 months ago
- ☆149Updated 6 months ago
- A State-Space Model with Rational Transfer Function Representation.☆77Updated 9 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆95Updated last month
- ☆211Updated 7 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆177Updated 5 months ago
- ☆78Updated 10 months ago
- Understand and test language model architectures on synthetic tasks.☆181Updated last month
- ☆181Updated this week
- Cost aware hyperparameter tuning algorithm☆142Updated 7 months ago
- Efficient optimizers☆169Updated this week
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆119Updated last week
- Evaluating the Mamba architecture on the Othello game☆44Updated 9 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆271Updated 3 months ago
- ☆280Updated last month
- ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).☆204Updated this week
- PyTorch implementation of models from the Zamba2 series.☆176Updated 3 weeks ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆205Updated this week
- Normalized Transformer (nGPT)☆152Updated 3 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆87Updated 8 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆83Updated last week