thu-ml / SpargeAttnLinks
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
β883Updated last week
Alternatives and similar repositories for SpargeAttn
Users that are interested in SpargeAttn are comparing it to the libraries listed below
Sorting:
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ728Updated 4 months ago
- πA curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.πβ482Updated last month
- π€A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.β861Updated this week
- A sparse attention kernel supporting mix sparse patternsβ423Updated 3 weeks ago
- A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Trainingβ598Updated this week
- [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attentionβ616Updated 3 weeks ago
- [NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generationβ570Updated last month
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic cachingβ409Updated 6 months ago
- [ICCV2025] From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeersβ354Updated 4 months ago
- β188Updated 11 months ago
- [CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Modelsβ716Updated last year
- [ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-tβ¦β2,987Updated 2 weeks ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ144Updated 9 months ago
- π Collection of awesome generation acceleration resources.β375Updated 6 months ago
- SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable SparseβLinear Attentionβ221Updated last week
- Aiming to integrate most existing feature caching-based diffusion acceleration schemes into a unified framework.β82Updated 2 months ago
- [ICLR2025] Accelerating Diffusion Transformers with Token-wise Feature Cachingβ202Updated 9 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inferenceβ619Updated 2 weeks ago
- A parallelism VAE avoids OOM for high resolution image generationβ84Updated 5 months ago
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelismβ2,489Updated 3 weeks ago
- β441Updated 4 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ263Updated 6 months ago
- An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional varβ¦β149Updated 6 months ago
- π³ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"β952Updated 9 months ago
- [CVPR 2024] DeepCache: Accelerating Diffusion Models for Freeβ950Updated last year
- Combining Teacache with xDiT to Accelerate Visual Generation Modelsβ32Updated 8 months ago
- High performance inference engine for diffusion modelsβ102Updated 4 months ago
- VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zooβ1,504Updated this week
- Efficient triton implementation of Native Sparse Attention.β257Updated 7 months ago
- Code for Draft Attentionβ98Updated 7 months ago