NX-AI / xlstm-jaxLinks
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.co/NX-AI/xLSTM-7b.
☆105Updated last year
Alternatives and similar repositories for xlstm-jax
Users that are interested in xlstm-jax are comparing it to the libraries listed below
Sorting:
- ☆287Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Updated last year
- Cost aware hyperparameter tuning algorithm☆177Updated last year
- 🧱 Modula software package☆322Updated 5 months ago
- Jax Codebase for Evolutionary Strategies at the Hyperscale☆213Updated 3 weeks ago
- ☆159Updated 2 months ago
- Getting crystal-like representations with harmonic loss☆195Updated 9 months ago
- ☆238Updated last month
- ☆235Updated last year
- ☆314Updated last year
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆107Updated last month
- A State-Space Model with Rational Transfer Function Representation.☆83Updated last year
- Accelerated First Order Parallel Associative Scan☆193Updated last week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆185Updated 6 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆344Updated 2 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆136Updated last year
- ☆70Updated last year
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆572Updated last year
- FlashRNN - Fast RNN Kernels with I/O Awareness☆174Updated 2 months ago
- Reliable, minimal and scalable library for pretraining foundation and world models☆117Updated this week
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆129Updated last year
- Dion optimizer algorithm☆416Updated 2 weeks ago
- Modular, scalable library to train ML models☆187Updated this week
- Efficient optimizers☆281Updated 3 weeks ago
- supporting pytorch FSDP for optimizers☆84Updated last year
- Understand and test language model architectures on synthetic tasks.☆249Updated last week
- An implementation of PSGD Kron second-order optimizer for PyTorch☆98Updated 5 months ago
- PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning☆576Updated 2 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆150Updated 3 months ago
- ☆62Updated last year