thu-ml / SpargeAttn
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
☆385Updated 2 weeks ago
Alternatives and similar repositories for SpargeAttn:
Users that are interested in SpargeAttn are comparing it to the libraries listed below
- Model Compression Toolbox for Large Language Models and Diffusion Models☆394Updated last month
- ☆140Updated this week
- ☆155Updated 2 months ago
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching☆229Updated this week
- Accelerating Diffusion Transformers with Token-wise Feature Caching☆115Updated 2 weeks ago
- Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossi…☆1,218Updated this week
- 📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉☆201Updated last week
- An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional var…☆121Updated last month
- [CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models☆670Updated 3 months ago
- 📚 Collection of awesome generation acceleration resources.☆182Updated this week
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆142Updated 4 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆69Updated last week
- Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"☆297Updated 3 months ago
- A sparse attention kernel supporting mix sparse patterns☆169Updated last month
- A parallelism VAE avoids OOM for high resolution image generation☆57Updated 2 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆458Updated last week
- [NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising☆193Updated last month
- From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers☆50Updated 2 weeks ago
- Scaling Diffusion Transformers with Mixture of Experts☆300Updated 6 months ago
- HART: Efficient Visual Generation with Hybrid Autoregressive Transformer☆482Updated 5 months ago
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆118Updated this week
- [ICLR 2025] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality☆205Updated 3 months ago
- [ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models☆1,060Updated last week
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆98Updated 8 months ago
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism☆1,722Updated this week
- Official PyTorch implementation of paper "CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up".☆201Updated last month
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models☆439Updated this week
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆601Updated last week
- (ToCa-v2) A New version of ToCa,with faster speed and better acceleration!☆30Updated 2 weeks ago
- [ICLR 2025] Autoregressive Video Generation without Vector Quantization☆429Updated this week