thu-ml / SageAttentionLinks

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

☆2,709

Alternatives and similar repositories for SageAttention

Users that are interested in SageAttention are comparing it to the libraries listed below

Sorting:

xdit-project / xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
☆2,428Updated last week
nunchaku-tech / nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
☆3,381Updated last week
thu-ml / SpargeAttn
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
☆785Updated last week
nunchaku-tech / deepcompressor
Model Compression Toolbox for Large Language Models and Diffusion Models
☆698Updated 3 months ago
ali-vilab / TeaCache
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
☆1,172Updated 5 months ago
hao-ai-lab / FastVideo
A unified inference and post-training framework for accelerated video generation.
☆2,659Updated this week
ModelTC / LightX2V
Light Video Generation Inference Framework
☆816Updated last week
chengzeyi / ParaAttention
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
☆388Updated 4 months ago
tdrussell / diffusion-pipe
A pipeline parallel training script for diffusion models.
☆1,731Updated 2 weeks ago
woct0rdho / triton-windows
Fork of the Triton language and compiler for Windows support and easy installation
☆1,583Updated last week
vipshop / cache-dit
🤗A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.
☆588Updated this week
xlite-dev / Awesome-DiT-Inference
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
☆447Updated 3 months ago
ModelTC / Qwen-Image-Lightning
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
☆997Updated last month
huggingface / finetrainers
Scalable and memory-optimized training of diffusion models
☆1,303Updated 5 months ago
mit-han-lab / radial-attention
[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation
☆558Updated 2 weeks ago
kohya-ss / musubi-tuner
☆1,461Updated last week
mit-han-lab / distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
☆714Updated 11 months ago
horseee / DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
☆944Updated last year
svg-project / Sparse-VideoGen
[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention
☆585Updated last week
tianweiy / CausVid
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
☆1,058Updated 3 months ago
SandAI-org / MagiAttention
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
☆562Updated this week
chengzeyi / stable-fast
https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
☆1,291Updated 8 months ago
FoundationVision / Infinity
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
☆1,503Updated 2 weeks ago
aigc-apps / VideoX-Fun
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.
☆1,550Updated last week
stepfun-ai / Step1X-Edit
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gem…
☆1,742Updated 2 months ago
sayakpaul / diffusers-torchao
End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).
☆385Updated 5 months ago
Shenyi-Z / TaylorSeer
[ICCV2025] From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
☆334Updated 3 months ago
WeichenFan / CFG-Zero-star
Official repo for CFG-Zero*
☆680Updated 6 months ago
LeanModels / DFloat11
DFloat11: Lossless LLM Compression for Efficient GPU Inference
☆562Updated this week
chengzeyi / Comfy-WaveSpeed
https://wavespeed.ai/ [WIP] The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.
☆1,192Updated 3 months ago