MLGroupJLU / RWKV-Survey
The official GitHub page for the survey paper "A Survey of RWKV".
☆22Updated last month
Alternatives and similar repositories for RWKV-Survey:
Users that are interested in RWKV-Survey are comparing it to the libraries listed below
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆23Updated 7 months ago
- A repository for DenseSSMs☆87Updated 10 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆53Updated 6 months ago
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆37Updated this week
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆29Updated 8 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆64Updated 10 months ago
- This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).☆12Updated 9 months ago
- A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)☆21Updated last year
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆17Updated 7 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated 3 weeks ago
- Here we will test various linear attention designs.☆59Updated 10 months ago
- ☆16Updated last year
- Triton implement of bi-directional (non-causal) linear attention☆42Updated last month
- ☆20Updated last year
- ☆24Updated 5 months ago
- ☆28Updated 7 months ago
- Mixture of Attention Heads☆41Updated 2 years ago
- The official Pytorch implementation of the paper "Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT …☆32Updated 11 months ago
- Scaling Sparse Fine-Tuning to Large Language Models☆16Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆19Updated 3 weeks ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆51Updated last month
- ☆50Updated 7 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆48Updated 2 years ago
- Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"☆13Updated 11 months ago
- ☆15Updated 7 months ago
- ☆21Updated 2 years ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated last week
- ☆33Updated last year