NVlabs / GatedDeltaNet
Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆45Updated this week
Alternatives and similar repositories for GatedDeltaNet:
Users that are interested in GatedDeltaNet are comparing it to the libraries listed below
- HGRN2: Gated Linear RNNs with State Expansion☆49Updated 3 months ago
- Stick-breaking attention☆37Updated this week
- Here we will test various linear attention designs.☆56Updated 7 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆84Updated 3 months ago
- ☆25Updated 9 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆25Updated 8 months ago
- Official code for the paper "Attention as a Hypernetwork"☆24Updated 5 months ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆42Updated 2 weeks ago
- ☆46Updated 10 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆111Updated 4 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆53Updated 2 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆33Updated 2 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆63Updated 7 months ago
- ☆64Updated 3 months ago
- Official Implementation Of The Paper: `DeciMamba: Exploring the Length Extrapolation Potential of Mamba'☆22Updated 4 months ago
- ☆36Updated 6 months ago
- ☆45Updated 5 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆60Updated this week
- ☆60Updated last month
- APOLLO: SGD-like Memory, AdamW-level Performance☆62Updated this week
- Implementation of Infini-Transformer in Pytorch☆106Updated 2 months ago
- ☆48Updated 2 months ago
- DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆64Updated 3 weeks ago
- ☆98Updated 9 months ago
- ☆19Updated this week
- ☆36Updated 8 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆85Updated 7 months ago
- ☆15Updated 5 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆21Updated 3 months ago
- A repository for DenseSSMs☆87Updated 8 months ago