fla-org / native-sparse-attentionLinks
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
☆778Updated 4 months ago
Alternatives and similar repositories for native-sparse-attention
Users that are interested in native-sparse-attention are comparing it to the libraries listed below
Sorting:
- Ring attention implementation with flash attention☆828Updated last week
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper☆700Updated last month
- Muon is Scalable for LLM Training☆1,240Updated this week
- slime is a LLM post-training framework aiming for RL Scaling.☆1,113Updated this week
- TransMLA: Multi-Head Latent Attention Is All You Need☆335Updated 3 weeks ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆537Updated 2 weeks ago
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework☆399Updated this week
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆428Updated 2 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆480Updated 5 months ago
- Scalable toolkit for efficient model reinforcement☆578Updated this week
- Helpful tools and examples for working with flex-attention☆908Updated 2 weeks ago
- Efficient triton implementation of Native Sparse Attention.☆186Updated 2 months ago
- 🔥 A minimal training framework for scaling FLA models☆220Updated last month
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆183Updated last month
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆320Updated this week
- ☆326Updated this week
- Efficient LLM Inference over Long Sequences☆385Updated last month
- Muon is an optimizer for hidden layers in neural networks☆1,390Updated 3 weeks ago
- A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training☆456Updated this week
- LLM KV cache compression made easy☆566Updated this week
- ☆506Updated last week
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,055Updated last week
- ☆802Updated last month
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆533Updated 2 months ago
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆307Updated 3 months ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆397Updated 8 months ago
- ☆198Updated 3 months ago
- Microsoft Automatic Mixed Precision Library☆616Updated 10 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆229Updated 3 weeks ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,851Updated 4 months ago