maomaocun / dLLM-cacheLinks
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache).
☆107Updated this week
Alternatives and similar repositories for dLLM-cache
Users that are interested in dLLM-cache are comparing it to the libraries listed below
Sorting:
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆233Updated 2 weeks ago
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.☆59Updated 3 weeks ago
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆166Updated this week
- 📚 Collection of token-level model compression resources.☆122Updated last week
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆97Updated 7 months ago
- Code release for VTW (AAAI 2025) Oral☆43Updated 5 months ago
- ☆82Updated last month
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification☆22Updated 2 months ago
- A Collection of Papers on Diffusion Language Models☆81Updated last week
- [ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…☆49Updated 2 months ago
- ✈️ Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆69Updated 2 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆73Updated 4 months ago
- The official code implementation of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"☆44Updated last week
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆42Updated 6 months ago
- ☆85Updated 2 months ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆62Updated 5 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆109Updated 4 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆134Updated last year
- ☆152Updated last week
- A generalized framework for subspace tuning methods in parameter efficient fine-tuning.☆142Updated 4 months ago
- ☆37Updated last month
- ☆104Updated 2 weeks ago
- Efficient Mixture of Experts for LLM Paper List☆72Updated 6 months ago
- ☆51Updated 3 months ago
- [ICLR 2025] Mixture Compressor for Mixture-of-Experts LLMs Gains More☆46Updated 4 months ago
- ☆167Updated 5 months ago
- [arXiv 2025] Efficient Reasoning Models: A Survey☆181Updated last week
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆131Updated 2 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆104Updated 3 weeks ago
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆71Updated 3 weeks ago