BienLuky / CacheQuantLinks
[CVPR 2025] The official implementation of "CacheQuant: Comprehensively Accelerated Diffusion Models"
☆25Updated last week
Alternatives and similar repositories for CacheQuant
Users that are interested in CacheQuant are comparing it to the libraries listed below
Sorting:
- [CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for…☆102Updated last week
- [ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…☆50Updated 3 months ago
- (ToCa-v2) A New version of ToCa,with faster speed and better acceleration!☆37Updated 4 months ago
- [CVPR 2025 Highlight] TinyFusion: Diffusion Transformers Learned Shallow☆130Updated 3 months ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆107Updated last year
- [ICLR2025] Accelerating Diffusion Transformers with Token-wise Feature Caching☆162Updated 4 months ago
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆102Updated 3 months ago
- ☆171Updated 6 months ago
- Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation☆59Updated this week
- ☆82Updated 3 months ago
- [CVPR 2025] Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers☆53Updated 10 months ago
- (ICLR 2025) BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models☆21Updated 9 months ago
- [ICCV2025]Generate one 2K image on single 3090 GPU!☆48Updated 2 weeks ago
- [AAAI-2025] The offical code for SiTo (Similarity-based Token Pruning for Stable Diffusion Models)☆35Updated last month
- The official implementation of "Neighboring Autoregressive Modeling for Efficient Visual Generation"☆54Updated 3 months ago
- Autoregressive Image Generation with Randomized Parallel Decoding☆69Updated 3 months ago
- HoliTom: Holistic Token Merging for Fast Video Large Language Models☆35Updated last month
- ☆14Updated 3 months ago
- FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.☆47Updated last year
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆152Updated 8 months ago
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"☆46Updated 6 months ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆62Updated last year
- Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer☆16Updated 7 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆78Updated last week
- [ICCV'25] The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Langua…☆47Updated this week
- Official repository of InLine attention (NeurIPS 2024)☆49Updated 6 months ago
- DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling☆33Updated last month
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆103Updated 3 months ago
- ☆29Updated last year
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆37Updated 5 months ago