thu-ml / SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
☆1,218Updated this week
Alternatives and similar repositories for SageAttention:
Users that are interested in SageAttention are comparing it to the libraries listed below
- SpargeAttention: A training-free sparse attention that can accelerate any model inference.☆385Updated 2 weeks ago
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism☆1,722Updated this week
- [ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models☆1,033Updated this week
- Model Compression Toolbox for Large Language Models and Diffusion Models☆394Updated last month
- Context parallel attention that accelerates DiT model inference with dynamic caching☆228Updated last week
- [CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models☆670Updated 3 months ago
- Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model☆601Updated 2 weeks ago
- 📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉☆201Updated last week
- FastVideo is a lightweight framework for accelerating large video diffusion models.☆1,283Updated this week
- ☆272Updated 3 months ago
- [CVPR 2024] DeepCache: Accelerating Diffusion Models for Free☆875Updated 9 months ago
- Fork of the Triton language and compiler for Windows support and easy installation☆755Updated this week
- Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis☆1,031Updated last month
- End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).☆335Updated last month
- Enhance-A-Video: Better Generated Video for Free☆483Updated 2 weeks ago
- A pipeline parallel training script for diffusion models.☆795Updated this week
- ☆511Updated 2 months ago
- ☆140Updated this week
- 📚 Collection of awesome generation acceleration resources.☆182Updated this week
- Next-Token Prediction is All You Need☆2,042Updated 2 weeks ago
- VideoSys: An easy and efficient system for video generation☆1,947Updated 3 weeks ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆601Updated last week
- Memory-optimized training library for diffusion models☆995Updated last week
- ☆155Updated 2 months ago
- https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.☆1,244Updated this week
- Efficient LLM Inference over Long Sequences☆365Updated last month
- [ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.☆347Updated last year
- Muon is Scalable for LLM Training☆993Updated this week
- Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraini…☆551Updated 7 months ago
- ☆449Updated 4 months ago