thu-ml / SpargeAttn
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
☆453Updated this week
Alternatives and similar repositories for SpargeAttn:
Users that are interested in SpargeAttn are comparing it to the libraries listed below
- Model Compression Toolbox for Large Language Models and Diffusion Models☆421Updated 3 weeks ago
- ☆170Updated this week
- Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossi…☆1,329Updated this week
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching☆243Updated 2 weeks ago
- ☆157Updated 3 months ago
- 📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉☆210Updated 3 weeks ago
- Accelerating Diffusion Transformers with Token-wise Feature Caching☆130Updated last month
- [CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models☆674Updated 4 months ago
- An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional var…☆125Updated 2 months ago
- A parallelism VAE avoids OOM for high resolution image generation☆61Updated 2 months ago
- 📚 Collection of awesome generation acceleration resources.☆202Updated this week
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆145Updated 5 months ago
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆140Updated 3 weeks ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆473Updated last week
- A sparse attention kernel supporting mix sparse patterns☆192Updated 2 months ago
- From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers☆71Updated 3 weeks ago
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism☆1,846Updated last week
- Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"☆300Updated 3 months ago
- Scaling Diffusion Transformers with Mixture of Experts☆311Updated 7 months ago
- [ICLR 2025] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality☆210Updated 3 months ago
- [NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising☆196Updated last month
- ☆106Updated this week
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆76Updated last month
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraini…☆584Updated 2 weeks ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆101Updated 9 months ago
- [CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis☆1,180Updated last month
- End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).☆340Updated 2 months ago
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework☆297Updated 2 weeks ago
- [CVPR 2024] DeepCache: Accelerating Diffusion Models for Free☆886Updated 9 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆182Updated last week