tensorgi / T6Links

The official implementation of Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)

☆373

Alternatives and similar repositories for T6

Users that are interested in T6 are comparing it to the libraries listed below

Sorting:

lucidrains / native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
☆637Updated 2 weeks ago
apple / ml-sigmoid-attention
☆286Updated last month
Haiyang-W / TokenFormer
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆559Updated 3 months ago
OpenNLPLab / lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
☆294Updated 3 months ago
fxmeng / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need
☆284Updated this week
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆166Updated 2 months ago
KellerJordan / Muon
Muon optimizer: +>30% sample efficiency with <3% wallclock overhead
☆661Updated this week
test-time-training / ttt-lm-jax
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆411Updated 9 months ago
kyleliang919 / C-Optim
When it comes to optimizers, it's always better to be safe than sorry
☆233Updated 2 months ago
pengzhangzhi / Awesome-Mamba
Awesome list of papers that extend Mamba to various applications.
☆133Updated last month
fla-org / native-sparse-attention
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
☆686Updated 2 months ago
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆221Updated 3 weeks ago
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆376Updated this week
kyegomez / Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
☆169Updated last month
AmeenAli / HiddenMambaAttn
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
☆222Updated last year
kuleshov-group / bd3lms
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
☆660Updated last month
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆70Updated 2 months ago
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆212Updated last year
hkproj / mamba-notes
Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)
☆168Updated last year
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆281Updated 2 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆181Updated 6 months ago
transformer-vq / transformer_vq
☆191Updated last year
TsinghuaC3I / Fourier-Position-Embedding
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆69Updated 4 months ago
pytorch-labs / attention-gym
Helpful tools and examples for working with flex-attention
☆802Updated last week
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆155Updated last month
AvivBick / awesome-ssm-ml
Reading list for research topics in state-space models
☆289Updated this week
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆141Updated 2 weeks ago
llm-random / llm-random
☆190Updated this week
Dao-AILab / causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
☆465Updated last week
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆513Updated 2 weeks ago