MLGroupJLU / RWKV-Survey
The official GitHub page for the survey paper "A Survey of RWKV".
☆25Updated 3 months ago
Alternatives and similar repositories for RWKV-Survey:
Users that are interested in RWKV-Survey are comparing it to the libraries listed below
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆30Updated 10 months ago
- A repository for DenseSSMs☆87Updated last year
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆20Updated 8 months ago
- ☆16Updated 2 years ago
- Triton implement of bi-directional (non-causal) linear attention☆46Updated 2 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆26Updated 2 weeks ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆65Updated last year
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated this week
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆51Updated 2 months ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆40Updated last year
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML'24)☆29Updated 8 months ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated last year
- Mixture of Attention Heads☆44Updated 2 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆27Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆35Updated last year
- [ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation☆42Updated last month
- This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).☆11Updated 11 months ago
- ☆48Updated last year
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆40Updated 2 years ago
- ☆14Updated last year
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 6 months ago
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆29Updated last year
- [CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention☆16Updated last month
- State Space Models☆69Updated 11 months ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 9 months ago
- ☆57Updated 2 months ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆15Updated 9 months ago
- BESA is a differentiable weight pruning technique for large language models.☆16Updated last year
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)☆22Updated 3 months ago