Theia-4869 / VisPrunerLinks

[ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs

☆50

Alternatives and similar repositories for VisPruner

Users that are interested in VisPruner are comparing it to the libraries listed below

Sorting:

Theia-4869 / FasterVLM
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆97Updated 4 months ago
OpenGVLab / PVC
[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
☆50Updated 5 months ago
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆70Updated 2 weeks ago
yu-rp / VisualPerceptionToken
☆126Updated 8 months ago
hasanar1f / HiRED
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…
☆43Updated 7 months ago
KD-TAO / DyCoke
[CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
☆87Updated last month
Alpha-Innovator / MME-Reasoning
Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMs
☆43Updated 5 months ago
Adlith / MoE-Jetpack
[NeurIPS 24] MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
☆133Updated last year
ywh187 / FitPrune
☆60Updated 6 months ago
xuyang-liu16 / VidCom2
[EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
☆38Updated 2 weeks ago
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆134Updated 8 months ago
Hon-Wong / ByteVideoLLM
[ICCV 2025] Dynamic-VLM
☆26Updated 11 months ago
NUS-HPC-AI-Lab / SGL
☆28Updated 8 months ago
Gumpest / SparseVLMs
[ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".
☆193Updated 5 months ago
zhishuifeiqian / VCR-Bench
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
☆32Updated 4 months ago
MME-Benchmarks / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆142Updated last month
ZichenWen1 / DART
[EMNLP 2025 main 🔥] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"
☆90Updated last month
XenoZLH / Shuffle-R1
Official code repository of Shuffle-R1
☆25Updated 2 months ago
thu-nics / FrameFusion
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
☆67Updated 3 weeks ago
czg1225 / VeriThinker
[NeurIPS 2025] VeriThinker: Learning to Verify Makes Reasoning Model Efficient
☆62Updated last month
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆92Updated 4 months ago
cokeshao / Awesome-Multimodal-Token-Compression
Survey: https://arxiv.org/pdf/2507.20198
☆203Updated last month
Fantasyele / LLaVA-KD
[ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
☆107Updated last month
eric-ai-lab / GRIT
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
☆163Updated last month
Theia-4869 / CDPruner
[NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
☆68Updated 2 months ago
seilk / VisAttnSink
[ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models
☆69Updated 9 months ago
MCG-NJU / p-MoD
[ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
☆43Updated 4 months ago
kxfan2002 / SophiaVL-R1
SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward
☆86Updated 3 months ago
LaVi-Lab / AIM
[ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"
☆44Updated last month
Mozhgan91 / LEO
LEO: A powerful Hybrid Multimodal LLM
☆18Updated 10 months ago