OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Sequence Modeling
☆62Updated 9 months ago
Alternatives and similar repositories for HGRN:
Users that are interested in HGRN are comparing it to the libraries listed below
- ☆47Updated last year
- ☆49Updated 7 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆52Updated 6 months ago
- ☆51Updated 2 years ago
- ☆33Updated last year
- ☆28Updated 7 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆25Updated 10 months ago
- ☆24Updated 4 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆48Updated last year
- ☆28Updated 3 months ago
- ☆99Updated 11 months ago
- ☆30Updated 11 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆59Updated last year
- ☆20Updated 2 years ago
- Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆13Updated 9 months ago
- Official Implementation Of The Paper: `DeciMamba: Exploring the Length Extrapolation Potential of Mamba'☆23Updated 6 months ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆48Updated 2 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆35Updated 4 months ago
- Stick-breaking attention☆43Updated last month
- [ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling☆78Updated 9 months ago
- ☆20Updated last year
- ☆51Updated 9 months ago
- ☆19Updated last year
- Mixture of Attention Heads☆41Updated 2 years ago
- Here we will test various linear attention designs.☆58Updated 9 months ago
- ☆44Updated last year
- Griffin MQA + Hawk Linear RNN Hybrid☆85Updated 9 months ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆97Updated last year
- 🔥 A minimal training framework for scaling FLA models☆55Updated this week
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆65Updated last year