NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆131Updated last week
Alternatives and similar repositories for GatedDeltaNet:
Users that are interested in GatedDeltaNet are comparing it to the libraries listed below
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆96Updated 5 months ago
- ☆253Updated 5 months ago
- FlashRNN - Fast RNN Kernels with I/O Awareness☆75Updated 2 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆123Updated 3 weeks ago
- Normalized Transformer (nGPT)☆152Updated 3 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆271Updated 3 months ago
- Some preliminary explorations of Mamba's context scaling.☆213Updated last year
- Griffin MQA + Hawk Linear RNN Hybrid☆85Updated 9 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆179Updated 3 weeks ago
- Accelerated First Order Parallel Associative Scan☆171Updated 6 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆196Updated 3 weeks ago
- Stick-breaking attention☆43Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆118Updated 5 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆221Updated this week
- ☆71Updated 6 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆102Updated 2 months ago
- Understand and test language model architectures on synthetic tasks.☆181Updated last month
- Implementation of Infini-Transformer in Pytorch☆109Updated last month
- ☆52Updated 4 months ago
- Explorations into the recently proposed Taylor Series Linear Attention☆92Updated 6 months ago
- Fast and memory-efficient exact attention☆58Updated this week
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆48Updated 2 months ago
- ☆71Updated 5 months ago
- A State-Space Model with Rational Transfer Function Representation.☆77Updated 9 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆60Updated 4 months ago
- Implementation of a multimodal diffusion transformer in Pytorch☆100Updated 7 months ago
- ☆158Updated 2 months ago