fla-org / native-sparse-attentionLinks

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

☆925

Alternatives and similar repositories for native-sparse-attention

Users that are interested in native-sparse-attention are comparing it to the libraries listed below

Sorting:

zhuzilin / ring-flash-attention
Ring attention implementation with flash attention
☆921Updated 2 months ago
lucidrains / native-sparse-attention-pytorch
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
☆782Updated 3 months ago
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆696Updated last month
MoonshotAI / Moonlight
Muon is Scalable for LLM Training
☆1,365Updated 3 months ago
feifeibear / long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
☆602Updated last month
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆311Updated last week
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆250Updated 6 months ago
ByteDance-Seed / VeOmni
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
☆1,338Updated this week
MuLabPKU / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
☆412Updated 2 months ago
QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆451Updated 6 months ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆379Updated 2 months ago
meta-pytorch / attention-gym
Helpful tools and examples for working with flex-attention
☆1,059Updated last week
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆391Updated 5 months ago
SandAI-org / MagiAttention
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆562Updated this week
mit-han-lab / duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆505Updated 9 months ago
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆195Updated last month
THUDM / slime
slime is an LLM post-training framework for RL Scaling.
☆2,543Updated this week
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆483Updated last week
NVIDIA / kvpress
LLM KV cache compression made easy
☆694Updated this week
stepfun-ai / Step3
☆438Updated 3 months ago
NVIDIA-NeMo / RL
Scalable toolkit for efficient model reinforcement
☆1,036Updated last week
apple / ml-cross-entropy
☆550Updated 2 months ago
thinking-machines-lab / batch_invariant_ops
☆912Updated 3 weeks ago
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆545Updated 6 months ago
tensorgi / TPA
[NeurIPS 2025 Spotlight] TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)
☆426Updated last month
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆173Updated 2 months ago
MoonshotAI / MoBA
MoBA: Mixture of Block Attention for Long-Context LLMs
☆2,006Updated 7 months ago
haoliuhl / ringattention
Large Context Attention
☆752Updated last month
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆275Updated 2 weeks ago
October2001 / Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆608Updated last month