Theia-4869 / VisPrunerLinks
[ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
β19Updated last month
Alternatives and similar repositories for VisPruner
Users that are interested in VisPruner are comparing it to the libraries listed below
Sorting:
- π Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Modelsβ28Updated 2 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.β84Updated last month
- Survey: https://arxiv.org/pdf/2507.20198β69Updated last week
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ68Updated last month
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ45Updated 2 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visioβ¦β41Updated 3 months ago
- β99Updated 4 months ago
- [ICLR2025] Ξ³ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Modelsβ37Updated 5 months ago
- Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMsβ40Updated last month
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?β35Updated last month
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Betterβ36Updated last month
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"β49Updated 3 weeks ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ117Updated 5 months ago
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.β34Updated 7 months ago
- LEO: A powerful Hybrid Multimodal LLMβ18Updated 6 months ago
- [CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Modelsβ40Updated 2 months ago
- β23Updated 5 months ago
- β54Updated 3 months ago
- [ICCV 2025] Dynamic-VLMβ23Updated 7 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".β138Updated 2 months ago
- [ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decayβ41Updated last month
- HoliTom: Holistic Token Merging for Fast Video Large Language Modelsβ39Updated 2 months ago
- VeriThinker: Learning to Verify Makes Reasoning Model Efficientβ49Updated last month
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"β35Updated last month
- β43Updated 9 months ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)β17Updated last month
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"β64Updated 3 months ago
- Adapting LLaMA Decoder to Vision Transformerβ29Updated last year
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Modelsβ41Updated 4 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Groundingβ65Updated 2 months ago