togethercomputer / flash-attention-3Links
Fast and memory-efficient exact attention
☆21Updated 10 months ago
Alternatives and similar repositories for flash-attention-3
Users that are interested in flash-attention-3 are comparing it to the libraries listed below
Sorting:
- faster parallel inference of mochi-1 video generation model☆125Updated 7 months ago
- Making Flux go brrr on GPUs.☆144Updated 2 months ago
- [NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising☆205Updated last week
- DC-Gen: Post-Training Diffusion Acceleration with Deeply Compressed Latent Space☆183Updated last week
- [NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation☆517Updated 3 weeks ago
- ☆76Updated 9 months ago
- [ICLR 2025] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality☆249Updated 9 months ago
- An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional var…☆143Updated 3 months ago
- [WIP] Better (FP8) attention for Hopper☆33Updated 7 months ago
- [arXiv] On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices☆123Updated 2 months ago
- Code for Draft Attention☆90Updated 4 months ago
- Official PyTorch implementation of TokenSet.☆125Updated 6 months ago
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆159Updated 11 months ago
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Updated 10 months ago
- Video-Infinity generates long videos quickly using multiple GPUs without extra training.☆184Updated last year
- 🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× …☆86Updated last month
- This repository provides a minimal, single-file implementation of SingLoRA (Single Matrix Low-Rank Adaptation) as described in the paper …☆44Updated this week
- End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).☆378Updated 4 months ago
- Writing FLUX in Triton☆40Updated last year
- Inference-time scaling of diffusion-based image and video generation models.☆169Updated 3 months ago
- A Unified Cache Acceleration Framework for 🤗 Diffusers: Qwen-Image-Lightning, Qwen-Image, HunyuanImage, FLUX, Wan, etc.☆378Updated this week
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆23Updated 7 months ago
- (CVPR 2025) Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis☆194Updated 2 months ago
- DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging☆45Updated 5 months ago
- Home Made Diffusion Models☆153Updated last month
- Recaption large (Web)Datasets with vllm and save the artifacts.☆52Updated 10 months ago
- UniDisc: A discrete diffusion model for joint multimodal generation, enabling controllable and efficient text-image synthesis, editing, a…☆126Updated 6 months ago
- [ICML2025] LoRA fine-tune directly on the quantized models.☆35Updated 10 months ago
- [ICCV2025] From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers☆292Updated last month
- Scale-wise Distillation of Diffusion Models☆111Updated 3 weeks ago