ModelTC / HarmoniCaLinks
[ICML 2025] This is the official PyTorch implementation of "🎵 HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration".
☆42Updated last month
Alternatives and similar repositories for HarmoniCa
Users that are interested in HarmoniCa are comparing it to the libraries listed below
Sorting:
- This repository contains the code for our ICML 2025 paper——LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection🎉☆25Updated 3 months ago
- VeriThinker: Learning to Verify Makes Reasoning Model Efficient☆52Updated last month
- [ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection☆22Updated 2 months ago
- ☆54Updated 3 months ago
- Doodling our way to AGI ✏️ 🖼️ 🧠☆94Updated 3 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆95Updated last month
- 📚 Collection of token-level model compression resources.☆151Updated this week
- Survey: https://arxiv.org/pdf/2507.20198☆121Updated this week
- Code release for VTW (AAAI 2025 Oral)☆49Updated last month
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆190Updated last week
- [Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Mul…☆29Updated last month
- Official implement of MIA-DPO☆64Updated 7 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆85Updated 2 weeks ago
- ☆67Updated 3 weeks ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆41Updated 4 months ago
- Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"☆61Updated last month
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆17Updated 4 months ago
- SFT+RL boosts multimodal reasoning☆27Updated 2 months ago
- (ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning☆56Updated last month
- [CVPR] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization☆42Updated last month
- [ICML 2025] This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality…☆52Updated 5 months ago
- ☆105Updated 5 months ago
- ☆41Updated last month
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆34Updated 7 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆145Updated 3 weeks ago
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"☆60Updated last month
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆120Updated 5 months ago
- ☆57Updated 3 months ago
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning☆25Updated this week
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆74Updated 3 weeks ago