dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆25Updated 10 months ago
Alternatives and similar repositories for rnn-icrag:
Users that are interested in rnn-icrag are comparing it to the libraries listed below
- ☆30Updated 11 months ago
- ☆49Updated 7 months ago
- ☆18Updated 8 months ago
- ☆80Updated 11 months ago
- ☆47Updated last year
- Long Context Extension and Generalization in LLMs☆48Updated 4 months ago
- Stick-breaking attention☆43Updated last month
- Xmixers: A collection of SOTA efficient token/channel mixers☆11Updated 3 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆52Updated 6 months ago
- Here we will test various linear attention designs.☆58Updated 9 months ago
- Official Code Repository for the paper "Key-value memory in the brain"☆22Updated 3 weeks ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆70Updated 3 months ago
- Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"☆65Updated last year
- ☆17Updated 7 months ago
- Efficient Scaling laws and collaborative pretraining.☆14Updated 3 weeks ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆42Updated 6 months ago
- ☆71Updated 6 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- Using FlexAttention to compute attention with different masking patterns☆40Updated 4 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- ☆28Updated 3 months ago
- Repository for Skill Set Optimization☆12Updated 6 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆77Updated 4 months ago
- Official code for the paper "Attention as a Hypernetwork"☆23Updated 7 months ago
- ☆26Updated last month
- Repository for the paper: 500xCompressor: Generalized Prompt Compression for Large Language Models☆25Updated 6 months ago
- Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆13Updated 9 months ago
- ☆24Updated 4 months ago
- ☆33Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆43Updated 2 weeks ago