dangxingyu / rnn-icragLinks

Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"

☆27

Alternatives and similar repositories for rnn-icrag

Users that are interested in rnn-icrag are comparing it to the libraries listed below

Sorting:

sjelassi / transformers_ssm_copy
☆33Updated last year
sustcsonglin / mamba-triton
☆48Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆54Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
HazyResearch / prefix-linear-attention
☆56Updated last year
yikangshen / megablocks
☆20Updated last year
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated 11 months ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆44Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
gregorbachmann / Next-Token-Failures
☆103Updated last year
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 3 months ago
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆65Updated last year
RobertCsordas / moeut
☆86Updated last year
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
Asap7772 / understanding-rlhf
Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…
☆32Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆29Updated 8 months ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆62Updated last year
janphilippfranken / sami
Self-Supervised Alignment with Mutual Information
☆21Updated last year
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated last month
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆35Updated last week
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆47Updated 7 months ago
srush / LLM-Talk
☆52Updated last year
sustcsonglin / gated_linear_attention_layer
☆31Updated last year
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 8 months ago
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆102Updated last week
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 6 months ago