FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]
☆46Feb 17, 2026Updated 2 weeks ago
Alternatives and similar repositories for FastCache-xDiT
Users that are interested in FastCache-xDiT are comparing it to the libraries listed below
Sorting:
- PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]☆48Feb 24, 2026Updated last week
- Accelerating Multitask Training Trough Adaptive Transition [Efficient ML Model]☆12May 23, 2025Updated 9 months ago
- Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]☆10Feb 11, 2025Updated last year
- ☆14Jan 23, 2026Updated last month
- [ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization☆49Nov 27, 2024Updated last year
- Kernel Library Wheel for SGLang☆16Updated this week
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆24Sep 23, 2025Updated 5 months ago
- Demo for Qwen2.5-VL-3B-Instruct on Axera device.☆17Sep 3, 2025Updated 6 months ago
- Source code for "Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference." In NeurIPS 2024☆21Dec 1, 2024Updated last year
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆30Dec 6, 2023Updated 2 years ago
- Model Quantization Benchmark☆18Sep 30, 2025Updated 5 months ago
- ☆22May 5, 2025Updated 9 months ago
- ☆19Apr 3, 2025Updated 11 months ago
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching☆424Jul 5, 2025Updated 7 months ago
- Personalized Regression☆17Dec 29, 2019Updated 6 years ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 7 months ago
- A parallelism VAE avoids OOM for high resolution image generation☆85Aug 4, 2025Updated 6 months ago
- ☆85Jan 23, 2025Updated last year
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Jun 12, 2025Updated 8 months ago
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- Aiming to integrate most existing feature caching-based diffusion acceleration schemes into a unified framework.☆91Oct 23, 2025Updated 4 months ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆60Mar 25, 2025Updated 11 months ago
- 📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉☆525Updated this week
- ☆65Apr 26, 2025Updated 10 months ago
- High performance inference engine for diffusion models☆105Sep 5, 2025Updated 5 months ago
- ☆61Nov 27, 2023Updated 2 years ago
- Estimate MFU for DeepSeekV3☆26Jan 5, 2025Updated last year
- ☆50May 19, 2025Updated 9 months ago
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)☆34Jan 18, 2025Updated last year
- Code for Draft Attention☆99May 22, 2025Updated 9 months ago
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆101Dec 15, 2025Updated 2 months ago
- Combining Teacache with xDiT to Accelerate Visual Generation Models☆32Apr 21, 2025Updated 10 months ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆33Nov 29, 2024Updated last year
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated last month
- Transformers components but in Triton☆34May 9, 2025Updated 9 months ago
- patches for huggingface transformers to save memory☆34Jun 2, 2025Updated 9 months ago
- Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model☆1,271Jun 8, 2025Updated 8 months ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago