NX-AI / xlstm-jax
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.co/NX-AI/xLSTM-7b.
โ91Updated 3 months ago
Alternatives and similar repositories for xlstm-jax:
Users that are interested in xlstm-jax are comparing it to the libraries listed below
- ๐งฑ Modula software packageโ188Updated 2 weeks ago
- โ215Updated 9 months ago
- โ289Updated 3 months ago
- Cost aware hyperparameter tuning algorithmโ150Updated 9 months ago
- A State-Space Model with Rational Transfer Function Representation.โ78Updated 11 months ago
- โ173Updated 4 months ago
- โ150Updated 8 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ103Updated 4 months ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.โ53Updated last week
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).โ112Updated 5 months ago
- Normalized Transformer (nGPT)โ167Updated 4 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.โ180Updated 7 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)โ80Updated last month
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"โ98Updated 3 months ago
- supporting pytorch FSDP for optimizersโ80Updated 4 months ago
- Accelerated First Order Parallel Associative Scanโ180Updated 7 months ago
- Understand and test language model architectures on synthetic tasks.โ191Updated last month
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesโ135Updated last month
- WIPโ93Updated 8 months ago
- Evaluating the Mamba architecture on the Othello gameโ46Updated 11 months ago
- โ98Updated last week
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAIโ279Updated 3 weeks ago
- Scalable and Performant Data Loadingโ235Updated this week
- โ212Updated this week
- DeMo: Decoupled Momentum Optimizationโ186Updated 4 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ123Updated last year
- A MAD laboratory to improve AI architecture designs ๐งชโ109Updated 4 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsโ228Updated last month
- seqax = sequence modeling + JAXโ153Updated last week
- โ92Updated 2 months ago