xlite-dev / Awesome-DiT-InferenceLinks
πA curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Caching, Quantization, Parallelism, etc.
β283Updated this week
Alternatives and similar repositories for Awesome-DiT-Inference
Users that are interested in Awesome-DiT-Inference are comparing it to the libraries listed below
Sorting:
- Model Compression Toolbox for Large Language Models and Diffusion Modelsβ501Updated 2 months ago
- β167Updated 5 months ago
- SpargeAttention: A training-free sparse attention that can accelerate any model inference.β620Updated last week
- π Collection of awesome generation acceleration resources.β270Updated 2 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inferenceβ519Updated last month
- A sparse attention kernel supporting mix sparse patternsβ238Updated 4 months ago
- A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Trainingβ385Updated this week
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ102Updated 3 months ago
- [ICML2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsityβ344Updated 2 weeks ago
- [CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Modelsβ690Updated 6 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ210Updated last week
- Accelerating Diffusion Transformers with Token-wise Feature Cachingβ159Updated 3 months ago
- A parallelism VAE avoids OOM for high resolution image generationβ64Updated 5 months ago
- Puzzles for learning Triton, play it with minimal environment configuration!β367Updated 6 months ago
- flash attention tutorial written in python, triton, cuda, cutlassβ377Updated last month
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic cachingβ304Updated last month
- [ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.β350Updated last year
- From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeersβ179Updated last month
- XAttention: Block Sparse Attention with Antidiagonal Scoringβ166Updated this week
- π° Must-read papers on KV Cache Compression (constantly updating π€).β459Updated this week
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Frameworkβ355Updated last month
- An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional varβ¦β130Updated 4 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ297Updated 7 months ago
- A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including languagβ¦β184Updated 4 months ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clustersβ47Updated 11 months ago
- A collection of memory efficient attention operators implemented in the Triton language.β272Updated last year
- Ring attention implementation with flash attentionβ789Updated 2 weeks ago
- [CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization forβ¦β65Updated last week
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"β233Updated 2 weeks ago
- Distributed Compiler Based on Triton for Parallel Systemsβ846Updated last week