fla-org / native-sparse-attentionLinks
π³ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
β702Updated 3 months ago
Alternatives and similar repositories for native-sparse-attention
Users that are interested in native-sparse-attention are comparing it to the libraries listed below
Sorting:
- Muon is Scalable for LLM Trainingβ1,081Updated 2 months ago
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paperβ657Updated 2 weeks ago
- Ring attention implementation with flash attentionβ789Updated last week
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inferenceβ519Updated 3 weeks ago
- Muon: An optimizer for hidden layers in neural networksβ897Updated 2 weeks ago
- TransMLA: Multi-Head Latent Attention Is All You Needβ310Updated this week
- Parallel Scaling Law for Language Model β Beyond Parameter and Inference Time Scalingβ395Updated last month
- Efficient triton implementation of Native Sparse Attention.β168Updated last month
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ466Updated 4 months ago
- Helpful tools and examples for working with flex-attentionβ831Updated 2 weeks ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMsβ176Updated this week
- π₯ A minimal training framework for scaling FLA modelsβ178Updated last week
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ210Updated last week
- slime is a LLM post-training framework aiming at scaling RL.β328Updated this week
- Efficient LLM Inference over Long Sequencesβ378Updated 3 weeks ago
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Frameworkβ355Updated last month
- Scalable toolkit for efficient model reinforcementβ438Updated this week
- LLM KV cache compression made easyβ520Updated this week
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Modelsβ703Updated 2 months ago
- Understanding R1-Zero-Like Training: A Critical Perspectiveβ991Updated last month
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Modelsβ313Updated 4 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β339Updated 6 months ago
- SpargeAttention: A training-free sparse attention that can accelerate any model inference.β620Updated this week
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ395Updated 7 months ago
- A sparse attention kernel supporting mix sparse patternsβ238Updated 4 months ago
- π° Must-read papers on KV Cache Compression (constantly updating π€).β459Updated this week
- MoBA: Mixture of Block Attention for Long-Context LLMsβ1,803Updated 2 months ago
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ521Updated last month
- β191Updated 2 months ago
- β789Updated 2 weeks ago