NX-AI / xlstm-jaxLinks
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.co/NX-AI/xLSTM-7b.
β104Updated 9 months ago
Alternatives and similar repositories for xlstm-jax
Users that are interested in xlstm-jax are comparing it to the libraries listed below
Sorting:
- π§± Modula software packageβ282Updated last month
- Cost aware hyperparameter tuning algorithmβ170Updated last year
- β282Updated last year
- β216Updated 10 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β164Updated 3 months ago
- Efficient optimizersβ265Updated last week
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsβ303Updated 2 months ago
- supporting pytorch FSDP for optimizersβ83Updated 10 months ago
- A MAD laboratory to improve AI architecture designs π§ͺβ129Updated 9 months ago
- Normalized Transformer (nGPT)β190Updated 10 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.β186Updated last year
- Accelerate, Optimize performance with streamlined training and serving options with JAX.β310Updated last week
- Getting crystal-like representations with harmonic lossβ194Updated 6 months ago
- DeMo: Decoupled Momentum Optimizationβ192Updated 10 months ago
- β58Updated last year
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"β562Updated last year
- Latent Program Network (from the "Searching Latent Program Spaces" paper)β98Updated last week
- β67Updated 10 months ago
- Dion optimizer algorithmβ361Updated last week
- An implementation of PSGD Kron second-order optimizer for PyTorchβ95Updated 2 months ago
- Annotated version of the Mamba paperβ489Updated last year
- Understand and test language model architectures on synthetic tasks.β231Updated 2 weeks ago
- β303Updated 9 months ago
- Attention Kernels for Symmetric Power Transformersβ120Updated 2 weeks ago
- Accelerated First Order Parallel Associative Scanβ189Updated last year
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).β122Updated 11 months ago
- πSmall Batch Size Training for Language Modelsβ62Updated last week
- PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learningβ524Updated 2 weeks ago
- A State-Space Model with Rational Transfer Function Representation.β81Updated last year
- β188Updated last month