mdy666 / Scalable-Flash-Native-Sparse-AttentionLinks
☆47Updated last month
Alternatives and similar repositories for Scalable-Flash-Native-Sparse-Attention
Users that are interested in Scalable-Flash-Native-Sparse-Attention are comparing it to the libraries listed below
Sorting:
- Fast and memory-efficient exact kmeans☆130Updated last month
- flex-block-attn: an efficient block sparse attention computation library☆94Updated 3 weeks ago
- Vortex: A Flexible and Efficient Sparse Attention Framework☆43Updated 2 weeks ago
- ☆93Updated last week
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆222Updated 6 months ago
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity☆64Updated 5 months ago
- ☆132Updated 6 months ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Updated 2 months ago
- ☆125Updated 3 months ago
- Triton implement of bi-directional (non-causal) linear attention☆58Updated 10 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆158Updated 2 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆251Updated 4 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆256Updated 5 months ago
- ☆155Updated 10 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆78Updated last year
- ☆207Updated 3 weeks ago
- Tiny-FSDP, a minimalistic re-implementation of the PyTorch FSDP☆91Updated 3 months ago
- Efficient triton implementation of Native Sparse Attention.☆254Updated 6 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.☆155Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆126Updated 5 months ago
- ☆101Updated 9 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆52Updated last year
- Efficient 2:4 sparse training algorithms and implementations☆58Updated last year
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Updated 10 months ago
- ☆77Updated 2 weeks ago
- A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…☆51Updated last month
- ☆62Updated 5 months ago
- Distributed MoE in a Single Kernel [NeurIPS '25]☆155Updated this week
- Quantized Attention on GPU☆44Updated last year
- 16-fold memory access reduction with nearly no loss☆109Updated 8 months ago