test-time-training / ttt-tkLinks

☆41

Alternatives and similar repositories for ttt-tk

Users that are interested in ttt-tk are comparing it to the libraries listed below

Sorting:

shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 3 months ago
samsja / muon_fsdp_2
Muon fsdp 2
☆44Updated 2 months ago
RobertCsordas / moeut
☆86Updated last year
HazyResearch / prefix-linear-attention
☆56Updated last year
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆119Updated 4 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆195Updated 4 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆101Updated last week
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆16Updated last year
sustcsonglin / mamba-triton
☆48Updated last year
berlino / seq_icl
☆53Updated last year
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆71Updated 7 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
thinking-machines-lab / manifolds
Supporting code for the blog post on modular manifolds.
☆86Updated 3 weeks ago
Infini-AI-Lab / gsm_infinite
☆55Updated 4 months ago
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆131Updated last month
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated 11 months ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆83Updated 10 months ago
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
sustcsonglin / linear-attention-and-beyond-slides
☆93Updated 7 months ago
HanGuo97 / log-linear-attention
☆251Updated 4 months ago
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆92Updated 3 months ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
srush / mamba-primer
☆38Updated last year
insuhan / hyper-attn
☆83Updated last year
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated last month
fal-ai-community / nano-mdm
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆56Updated 7 months ago