NX-AI / xlstm-jaxLinks
Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.co/NX-AI/xLSTM-7b.
โ105Updated 11 months ago
Alternatives and similar repositories for xlstm-jax
Users that are interested in xlstm-jax are comparing it to the libraries listed below
Sorting:
- ๐งฑ Modula software packageโ309Updated 3 months ago
- โ225Updated last year
- โ152Updated last month
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ174Updated 5 months ago
- Efficient optimizersโ275Updated last month
- supporting pytorch FSDP for optimizersโ84Updated last year
- Dion optimizer algorithmโ403Updated this week
- โ285Updated last year
- A MAD laboratory to improve AI architecture designs ๐งชโ135Updated 11 months ago
- Getting crystal-like representations with harmonic lossโ192Updated 8 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 secondsโ330Updated 3 weeks ago
- FlashRNN - Fast RNN Kernels with I/O Awarenessโ171Updated last month
- Accelerated First Order Parallel Associative Scanโ192Updated last year
- โ68Updated last year
- DeMo: Decoupled Momentum Optimizationโ197Updated last year
- Jax Codebase for Evolutionary Strategies at the Hyperscaleโ181Updated 3 weeks ago
- Cost aware hyperparameter tuning algorithmโ176Updated last year
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).โ128Updated last year
- A State-Space Model with Rational Transfer Function Representation.โ83Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.โ186Updated last year
- โ231Updated last week
- โ105Updated 4 months ago
- Understand and test language model architectures on synthetic tasks.โ243Updated 2 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorchโ97Updated 4 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)โ106Updated 2 weeks ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"โ103Updated 11 months ago
- โ121Updated 6 months ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"โ272Updated 2 weeks ago
- Normalized Transformer (nGPT)โ193Updated last year
- Supporting code for the blog post on modular manifolds.โ104Updated 2 months ago