thu-ml / SpargeAttnLinks
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
β785Updated last week
Alternatives and similar repositories for SpargeAttn
Users that are interested in SpargeAttn are comparing it to the libraries listed below
Sorting:
- A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Trainingβ562Updated this week
- πA curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.πβ447Updated 3 months ago
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ698Updated 3 months ago
- π€A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.β588Updated this week
- A sparse attention kernel supporting mix sparse patternsβ385Updated 9 months ago
- [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attentionβ585Updated last week
- β187Updated 10 months ago
- [ICCV2025] From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeersβ334Updated 3 months ago
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic cachingβ388Updated 4 months ago
- [NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generationβ558Updated 2 weeks ago
- [CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Modelsβ714Updated 11 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ135Updated 8 months ago
- [ICLR2025] Accelerating Diffusion Transformers with Token-wise Feature Cachingβ195Updated 8 months ago
- π Collection of awesome generation acceleration resources.β365Updated 4 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inferenceβ602Updated last month
- Aiming to integrate most existing feature caching-based diffusion acceleration schemes into a unified framework.β77Updated last month
- A parallelism VAE avoids OOM for high resolution image generationβ83Updated 3 months ago
- β438Updated 3 months ago
- SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable SparseβLinear Attentionβ140Updated 2 weeks ago
- [ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-tβ¦β2,709Updated 3 weeks ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"β696Updated last month
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ255Updated 4 months ago
- π³ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"β925Updated 8 months ago
- An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional varβ¦β146Updated 5 months ago
- Light Video Generation Inference Frameworkβ816Updated last week
- VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zooβ1,338Updated this week
- High performance inference engine for diffusion modelsβ95Updated 2 months ago
- Efficient triton implementation of Native Sparse Attention.β250Updated 6 months ago
- [CVPR 2024] DeepCache: Accelerating Diffusion Models for Freeβ944Updated last year
- [ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.β365Updated last year