Theia-4869 / CDPrunerLinks
[NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.
β65Updated last month
Alternatives and similar repositories for CDPruner
Users that are interested in CDPruner are comparing it to the libraries listed below
Sorting:
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".β180Updated 4 months ago
- [EMNLP 2025 main π₯] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"β84Updated 3 weeks ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ81Updated last month
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ133Updated 7 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.β95Updated 4 months ago
- Survey: https://arxiv.org/pdf/2507.20198β186Updated last week
- β58Updated 5 months ago
- [ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"β64Updated last month
- This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehensβ¦β73Updated 6 months ago
- Code release for VTW (AAAI 2025 Oral)β50Updated 3 months ago
- [CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Modelsβ51Updated 5 months ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Contβ¦β60Updated last month
- π Collection of token-level model compression resources.β176Updated last month
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understandingβ120Updated 2 months ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.β74Updated last week
- π Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Modelsβ36Updated last week
- Official repository for VisionZip (CVPR 2025)β366Updated 3 months ago
- β125Updated 7 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Modelsβ152Updated last month
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'β284Updated 6 months ago
- [NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.β86Updated 9 months ago
- [CVPR 2025] PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Modelsβ49Updated last month
- π₯CVPR 2025 Multimodal Large Language Models Paper Listβ156Updated 7 months ago
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillationβ205Updated 7 months ago
- Official implementation of MC-LLaVA.β140Updated 2 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β195Updated 3 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videosβ88Updated last month
- Collections of Papers and Projects for Multimodal Reasoning.β105Updated 6 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoningβ105Updated 5 months ago
- Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"β50Updated 4 months ago