belindal / state-trackingLinks

Code and data for paper "(How) do Language Models Track State?"

☆20

Alternatives and similar repositories for state-tracking

Users that are interested in state-tracking are comparing it to the libraries listed below

Sorting:

GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆27Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
google-deepmind / spectral_ssm
☆34Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
Ryu1845 / hyena-jax
Implementation of Hyena Hierarchy in JAX
☆10Updated 2 years ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆45Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
proger / nanokitchen
Parallel Associative Scan for Language Models
☆17Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated last year
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated 2 months ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆35Updated 2 weeks ago
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆36Updated 6 months ago
yikangshen / megablocks
☆20Updated last year
bdusell / stack-attention
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Updated last year
kazuki-irie / kv-memory-brain
Official Code Repository for the paper "Key-value memory in the brain"
☆29Updated 8 months ago
RodkinIvan / associative-recurrent-memory-transformer
[ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluation
☆56Updated this week
shangshang-wang / Resa
Resa: Transparent Reasoning Models via SAEs
☆44Updated last month
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated last year
Infini-AI-Lab / Kinetics
Kinetics: Rethinking Test-Time Scaling Laws
☆81Updated 3 months ago
allenai / signal-and-noise
Measuring the Signal to Noise Ratio in Language Model Evaluation
☆25Updated 2 months ago
sustcsonglin / mamba-triton
☆48Updated last year
edwardmilsom / function-space-learning-rates-paper
Code for the paper "Function-Space Learning Rates"
☆23Updated 5 months ago
howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆51Updated 3 months ago
Doraemonzzz / hgru2-pytorch
☆23Updated last year
jxiw / M1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
☆44Updated 3 months ago
hazan-lab / flash-stu
PyTorch implementation of the Flash Spectral Transform Unit.
☆19Updated last year
IBM / ColPret
Efficient Scaling laws and collaborative pretraining.
☆18Updated last month
automl / unlocking_state_tracking
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆17Updated 7 months ago
zaydzuhri / flame
Fork of Flame repo for training of some new stuff in development
☆18Updated 3 weeks ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year