XunhaoLai / native-sparse-attention-tritonLinks
Efficient triton implementation of Native Sparse Attention.
β186Updated 2 months ago
Alternatives and similar repositories for native-sparse-attention-triton
Users that are interested in native-sparse-attention-triton are comparing it to the libraries listed below
Sorting:
- π₯ A minimal training framework for scaling FLA modelsβ209Updated last month
- Triton implementation of FlashAttention2 that adds Custom Masks.β128Updated 11 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ149Updated last month
- β123Updated 2 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ213Updated 3 weeks ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ218Updated last month
- β228Updated last month
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ108Updated last month
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMsβ141Updated this week
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ124Updated 2 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"β320Updated this week
- A sparse attention kernel supporting mix sparse patternsβ262Updated 5 months ago
- β79Updated 5 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMsβ183Updated last month
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.β85Updated 7 months ago
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>β141Updated 3 weeks ago
- β137Updated 5 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ480Updated 5 months ago
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Frameworkβ399Updated this week
- β113Updated last month
- A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cachβ¦β32Updated 2 weeks ago
- The evaluation framework for training-free sparse attention in LLMsβ86Updated last month
- Triton-based implementation of Sparse Mixture of Experts.β230Updated 8 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ212Updated 11 months ago
- β81Updated last week
- 16-fold memory access reduction with nearly no lossβ102Updated 4 months ago
- β326Updated this week
- qwen-nsaβ70Updated 3 months ago
- β54Updated 3 weeks ago
- Low-bit optimizers for PyTorchβ130Updated last year