mit-han-lab / x-attentionLinks

[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring

☆256

Alternatives and similar repositories for x-attention

Users that are interested in x-attention are comparing it to the libraries listed below

Sorting:

ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆155Updated last month
mit-han-lab / Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
☆391Updated 9 months ago
mit-han-lab / flash-moba
☆201Updated 2 weeks ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆251Updated 6 months ago
z-lab / sparselora
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆61Updated 4 months ago
attention-survey / Efficient_Attention_Survey
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
☆240Updated this week
andy-yang-1 / DoubleSparse
16-fold memory access reduction with nearly no loss
☆108Updated 8 months ago
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆250Updated 3 months ago
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆175Updated 2 months ago
hao-ai-lab / Awesome-Video-Attention
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…
☆49Updated last month
thu-nics / MoA
[CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
☆151Updated last week
Dao-AILab / grouped-latent-attention
☆132Updated 6 months ago
maomaocun / dLLM-cache
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…
☆185Updated 2 weeks ago
shadowpa0327 / Palu
[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection
☆148Updated 9 months ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆311Updated 2 weeks ago
thu-ml / SLA
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
☆142Updated 3 weeks ago
thu-nics / DiTFastAttn
☆187Updated 10 months ago
alexzhang13 / flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks.
☆151Updated last year
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆356Updated 4 months ago
horseee / dKV-Cache
[NeurIPS'25] dKV-Cache: The Cache for Diffusion Language Models
☆121Updated 6 months ago
FasterDecoding / TEAL
☆154Updated 9 months ago
thu-nics / ViDiT-Q
[ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
☆138Updated 8 months ago
svg-project / flash-kmeans
Fast and memory-efficient exact kmeans
☆127Updated 3 weeks ago
Aaronhuang-778 / Mixture-Compressor-MoE
[ICLR 2025] Mixture Compressor for Mixture-of-Experts LLMs Gains More
☆62Updated 9 months ago
OpenSparseLLMs / Linearization
☆61Updated 4 months ago
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆126Updated 5 months ago
ChenMnZ / PrefixQuant
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆166Updated last week
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆99Updated 11 months ago
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆275Updated 3 weeks ago
FFY0 / AdaKV
The Official Implementation of Ada-KV [NeurIPS 2025]
☆116Updated last week