Theia-4869 / CDPrunerLinks
Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
β31Updated this week
Alternatives and similar repositories for CDPruner
Users that are interested in CDPruner are comparing it to the libraries listed below
Sorting:
- π Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Modelsβ23Updated 2 weeks ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visioβ¦β39Updated 2 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoningβ15Updated last month
- β21Updated 4 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Modelβ30Updated 5 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Modelsβ19Updated 4 months ago
- Code release for VTW (AAAI 2025) Oralβ43Updated 5 months ago
- [CVPR2025] Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Modelsβ12Updated last month
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"β22Updated 2 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Betterβ29Updated last week
- Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMsβ38Updated last week
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Contβ¦β42Updated 6 months ago
- Official project page of "HiMix: Reducing Computational Complexity in Large Vision-Language Models"β12Updated 5 months ago
- Official repo of M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learningβ24Updated 3 months ago
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Modelsβ30Updated 4 months ago
- β42Updated 7 months ago
- Official implementation of MC-LLaVA.β28Updated 3 weeks ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Groundingβ61Updated 2 weeks ago
- Less is More: High-value Data Selection for Visual Instruction Tuningβ14Updated 5 months ago
- [CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Modelsβ28Updated last month
- β14Updated last month
- [ICLR2025] Ξ³ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Modelsβ36Updated 4 months ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsificationβ34Updated 3 months ago
- ICLR 2025β26Updated last month
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ41Updated 2 weeks ago
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".β16Updated last month
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.β32Updated 5 months ago
- Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"β29Updated last month
- a training-free approach to accelerate ViTs and VLMs by pruning redundant tokens based on similarityβ28Updated last month
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Keyβ61Updated 3 weeks ago