lucidrains / minGRU-pytorch
Implementation of the proposed minGRU in Pytorch
☆282Updated last week
Alternatives and similar repositories for minGRU-pytorch:
Users that are interested in minGRU-pytorch are comparing it to the libraries listed below
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆114Updated 5 months ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆650Updated 3 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆101Updated 3 months ago
- PyTorch implementation of Structured State Space for Sequence Modeling (S4), based on Annotated S4.☆77Updated last year
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆77Updated last month
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆87Updated last year
- [ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆535Updated last month
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆142Updated this week
- Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)☆75Updated 10 months ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆398Updated 3 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆276Updated this week
- Implementation of Block Recurrent Transformer - Pytorch☆218Updated 7 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆188Updated last month
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆120Updated 7 months ago
- An implementation of local windowed attention for language modeling☆428Updated 2 months ago
- Pytorch implementation of the xLSTM model by Beck et al. (2024)☆159Updated 7 months ago
- ☆286Updated 2 months ago
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆306Updated 2 months ago
- Accelerated First Order Parallel Associative Scan☆175Updated 7 months ago
- ☆169Updated 3 months ago
- Helpful tools and examples for working with flex-attention☆689Updated last week
- ☆261Updated last month
- Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"☆419Updated 3 months ago
- Reading list for research topics in state-space models☆267Updated 2 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆214Updated 3 weeks ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆318Updated 9 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆125Updated last month
- Explorations into the recently proposed Taylor Series Linear Attention☆94Updated 7 months ago
- The AdEMAMix Optimizer: Better, Faster, Older.☆179Updated 6 months ago