vipshop/cache-dit

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vipshop/cache-dit)

vipshop / cache-dit

A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.

☆1,234

Alternatives and similar repositories for cache-dit

Users that are interested in cache-dit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xlite-dev / Awesome-DiT-Inference
View on GitHub
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
☆578Jun 13, 2026Updated last month
hao-ai-lab / FastVideo
View on GitHub
A unified inference and post-training framework for accelerated video generation.
☆3,869Updated this week
ali-vilab / TeaCache
View on GitHub
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
☆1,357Jun 8, 2025Updated last year
ModelTC / LightX2V
View on GitHub
Lightweight Image Video Action Generation Inference Framework
☆2,517Updated this week
Shenyi-Z / TaylorSeer
View on GitHub
[ICCV2025] From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
☆407Mar 2, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xdit-project / xDiT
View on GitHub
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
☆2,659Jul 14, 2026Updated last week
chengzeyi / ParaAttention
View on GitHub
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
☆427Jul 5, 2025Updated last year
UnicomAI / LeMiCa
View on GitHub
[NeurIPS 2025 Spotlight] LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation
☆122Jun 22, 2026Updated last month
thu-ml / TurboDiffusion
View on GitHub
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
☆3,578Jul 16, 2026Updated last week
thu-ml / SageAttention
View on GitHub
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-t…
☆3,500Jan 17, 2026Updated 6 months ago
nunchaku-ai / nunchaku
View on GitHub
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
☆3,911Mar 7, 2026Updated 4 months ago
svg-project / Sparse-VideoGen
View on GitHub
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
☆694Jul 4, 2026Updated 2 weeks ago
thu-ml / SpargeAttn
View on GitHub
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
☆1,017Feb 25, 2026Updated 4 months ago
ByteDance-Seed / Triton-distributed
View on GitHub
Distributed Compiler based on Triton for Parallel Systems
☆1,495Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ByteDance-Seed / VeOmni
View on GitHub
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
☆2,102Updated this week
xlite-dev / ffpa-attn
View on GitHub
🤖FFPA: Extends FA-2/3 via Split-D for large headdims, 1.5x~6×↑🎉 vs SDPA, up to 513~535 TFLOPS🎉 on NVIDIA H200.
☆315Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆6,001Updated this week
huggingface / flux-fast
View on GitHub
Making Flux go brrr on GPUs.
☆171Jan 5, 2026Updated 6 months ago
GoatWu / Self-Forcing-Plus
View on GitHub
Unofficial extension implementation of Self-Forcing to support I2V && 14B training.
☆380Sep 29, 2025Updated 9 months ago
Tammytcl / Awesome-Diffusion-Acceleration-Cache
View on GitHub
A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration tech…
☆86Nov 4, 2025Updated 8 months ago
NVlabs / rcm
View on GitHub
rCM & Causal-rCM: Leading and Unified Algorithms/Infrastructures for Bidirectional/Autoregressive Video Diffusion Distillation at Scale
☆770Jun 25, 2026Updated 3 weeks ago
nunchaku-ai / deepcompressor
View on GitHub
Model Compression Toolbox for Large Language Models and Diffusion Models
☆795Aug 14, 2025Updated 11 months ago
mit-han-lab / radial-attention
View on GitHub
[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation
☆604Nov 11, 2025Updated 8 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
SandAI-org / MagiAttention
View on GitHub
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆887Updated this week
tianweiy / DMD2
View on GitHub
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
☆1,405Mar 5, 2025Updated last year
ModelTC / LightX2V-Qwen-Image-Lightning
View on GitHub
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
☆1,340Jan 1, 2026Updated 6 months ago
vllm-project / vllm-omni
View on GitHub
A framework for efficient model inference with omni-modality models
☆5,643Updated this week
RiseAI-Sys / DAX
View on GitHub
High performance inference engine for diffusion models
☆107Sep 5, 2025Updated 10 months ago
yifan123 / flow_grpo
View on GitHub
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
☆2,426May 7, 2026Updated 2 months ago
Shenyi-Z / Cache4Diffusion
View on GitHub
Aiming to integrate most existing feature caching-based diffusion acceleration schemes into a unified framework.
☆110Oct 23, 2025Updated 9 months ago
modelscope / DiffSynth-Engine
View on GitHub
☆426Jul 8, 2026Updated 2 weeks ago
Jasonzzt / ComfyUI-CacheDiT
View on GitHub
Cache-DiT Node for Comfyui
☆296Apr 15, 2026Updated 3 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
xlite-dev / flux-faster
View on GitHub
A forked version of flux-fast that makes flux-fast even faster with cache-dit, 3.3x speedup on NVIDIA L20.
☆24Jul 18, 2025Updated last year
xdit-project / DistVAE
View on GitHub
A parallelism VAE avoids OOM for high resolution image generation
☆95May 8, 2026Updated 2 months ago
Bujiazi / DiCache
View on GitHub
[ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache
☆61Jan 26, 2026Updated 5 months ago
Lakonik / LakonLab
View on GitHub
Official implementation of AsymFlow, pi-Flow, GMFlow
☆451Jul 14, 2026Updated last week
Zehong-Ma / MagCache
View on GitHub
The official code for NeurIPS 2025 "MagCache: Fast Video Generation with Magnitude-Aware Cache"
☆275Nov 17, 2025Updated 8 months ago
bytedance / flux
View on GitHub
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,344Aug 28, 2025Updated 10 months ago
tianweiy / CausVid
View on GitHub
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
☆1,404Aug 7, 2025Updated 11 months ago