annosubmission / GRC-CacheLinks
☆16Updated 2 years ago
Alternatives and similar repositories for GRC-Cache
Users that are interested in GRC-Cache are comparing it to the libraries listed below
Sorting:
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated last month
- ☆17Updated 5 months ago
- ☆23Updated 9 months ago
- ☆47Updated last year
- Code for paper: "LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits"☆13Updated 8 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 8 months ago
- User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…☆21Updated last month
- [ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…☆12Updated 3 months ago
- A repository for DenseSSMs☆87Updated last year
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆24Updated 6 months ago
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆11Updated 2 years ago
- ☆42Updated 7 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆55Updated 10 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆65Updated last year
- The official repo of continuous speculative decoding☆27Updated 2 months ago
- ☆15Updated last week
- Mixture of Attention Heads☆47Updated 2 years ago
- Open source community's implementation of the model from "LANGUAGE MODEL BEATS DIFFUSION — TOKENIZER IS KEY TO VISUAL GENERATION"☆16Updated 7 months ago
- The repository for our paper: Neighboring Perturbations of Knowledge Editing on Large Language Models☆16Updated last year
- More dimensions = More fun☆22Updated 10 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆31Updated last year
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆15Updated 5 months ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆11Updated 7 months ago
- Implementation of the model "Hedgehog" from the paper: "The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry"☆14Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆39Updated last year
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆55Updated last year
- ☆27Updated 11 months ago
- Code for the paper "Cottention: Linear Transformers With Cosine Attention"☆17Updated 8 months ago
- ☆21Updated 2 years ago