NX-AI / xlstm-jaxLinks
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.co/NX-AI/xLSTM-7b.
โ101Updated 7 months ago
Alternatives and similar repositories for xlstm-jax
Users that are interested in xlstm-jax are comparing it to the libraries listed below
Sorting:
- ๐งฑ Modula software packageโ225Updated last week
- โ275Updated last year
- โ207Updated 8 months ago
- Cost aware hyperparameter tuning algorithmโ168Updated last year
- โ101Updated last month
- Getting crystal-like representations with harmonic lossโ194Updated 4 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.โ185Updated 11 months ago
- A MAD laboratory to improve AI architecture designs ๐งชโ125Updated 8 months ago
- Efficient optimizersโ256Updated 3 weeks ago
- supporting pytorch FSDP for optimizersโ84Updated 8 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ153Updated 2 months ago
- โ298Updated 7 months ago
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).โ121Updated 10 months ago
- Understand and test language model architectures on synthetic tasks.โ222Updated last month
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"โ560Updated last year
- Accelerate, Optimize performance with streamlined training and serving options with JAX.โ301Updated last week
- A State-Space Model with Rational Transfer Function Representation.โ79Updated last year
- An implementation of PSGD Kron second-order optimizer for PyTorchโ96Updated last month
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsโ284Updated last month
- Accelerated First Order Parallel Associative Scanโ187Updated last year
- Dion optimizer algorithmโ305Updated this week
- Latent Program Network (from the "Searching Latent Program Spaces" paper)โ93Updated 5 months ago
- Normalized Transformer (nGPT)โ186Updated 9 months ago
- FlashRNN - Fast RNN Kernels with I/O Awarenessโ94Updated 2 months ago
- Annotated version of the Mamba paperโ488Updated last year
- PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learningโ428Updated last month
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jaxโ646Updated this week
- Training small GPT-2 style models using Kolmogorov-Arnold networks.โ121Updated last year
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.โ69Updated 2 weeks ago
- Evaluating the Mamba architecture on the Othello gameโ48Updated last year