🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× vs cuBLAS
☆101Sep 8, 2025Updated 5 months ago
Alternatives and similar repositories for chipmunk
Users that are interested in chipmunk are comparing it to the libraries listed below
Sorting:
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆33Nov 29, 2024Updated last year
- Low overhead tracing library and trace visualizer for pipelined CUDA kernels☆131Nov 26, 2025Updated 3 months ago
- Fast low-bit matmul kernels in Triton☆433Feb 1, 2026Updated last month
- ☆32Jul 2, 2025Updated 7 months ago
- A sparse attention kernel supporting mix sparse patterns☆467Jan 18, 2026Updated last month
- EleutherAI ML Performance reading group repository (slides, meeting recordings, annotated papers)☆26Dec 19, 2025Updated 2 months ago
- [NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising☆212Sep 27, 2025Updated 5 months ago
- This repository includes the official implementation of our paper "Grouping First, Attending Smartly: Training-Free Acceleration for Diff…☆55May 21, 2025Updated 9 months ago
- [ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention☆629Feb 3, 2026Updated 3 weeks ago
- [ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache☆55Jan 26, 2026Updated last month
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching☆424Jul 5, 2025Updated 7 months ago
- Perplexity GPU Kernels☆567Nov 7, 2025Updated 3 months ago
- Code for Draft Attention☆99May 22, 2025Updated 9 months ago
- [ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.☆952Updated this week
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- ☆27Aug 25, 2023Updated 2 years ago
- A parallelism VAE avoids OOM for high resolution image generation☆85Aug 4, 2025Updated 6 months ago
- A Quirky Assortment of CuTe Kernels☆814Feb 23, 2026Updated last week
- A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training☆650Updated this week
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- Aiming to integrate most existing feature caching-based diffusion acceleration schemes into a unified framework.☆89Oct 23, 2025Updated 4 months ago
- ☆261Jul 11, 2024Updated last year
- GitHub repository for the Bria 3.2 pipeline☆44Sep 10, 2025Updated 5 months ago
- ☆52May 19, 2025Updated 9 months ago
- Fastest kernels written from scratch☆548Sep 18, 2025Updated 5 months ago
- ☆32Nov 11, 2024Updated last year
- [IEEE/CVF CVPR'2022] "ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation", Duolikun Danier, Fan Zhang, David Bull☆13Oct 9, 2023Updated 2 years ago
- Optimizing diffusion for production-ready speeds☆36Jan 10, 2026Updated last month
- ☆13Jan 15, 2025Updated last year
- Making Flux go brrr on GPUs.☆163Jan 5, 2026Updated last month
- [NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation☆583Nov 11, 2025Updated 3 months ago
- ☆190Jan 14, 2025Updated last year
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study un…☆17Dec 17, 2025Updated 2 months ago
- A Powerful LoRA key converter for ComfyUI☆28Nov 17, 2025Updated 3 months ago
- A resume template written in typst, designed for zh_CN.☆13Mar 3, 2025Updated 11 months ago
- [TMM 2025] Official Implementation of DreamJourney: Perpetual View Generation with Video Diffusion Models☆17Jun 24, 2025Updated 8 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆161Oct 13, 2025Updated 4 months ago
- High performance inference engine for diffusion models☆105Sep 5, 2025Updated 5 months ago
- ☆65Apr 26, 2025Updated 10 months ago