xuyang-liu16 / VidCom2Links
π Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
β36Updated last week
Alternatives and similar repositories for VidCom2
Users that are interested in VidCom2 are comparing it to the libraries listed below
Sorting:
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ132Updated 7 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understandingβ120Updated 2 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ81Updated last month
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Modelsβ55Updated 5 months ago
- β125Updated 7 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ227Updated 2 months ago
- [ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decayβ43Updated 4 months ago
- Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"β72Updated last month
- β58Updated 5 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".β180Updated 4 months ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsificationβ39Updated 7 months ago
- π Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Modelsβ33Updated 3 months ago
- [NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.β65Updated last month
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasingββ57Updated 4 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ48Updated 4 months ago
- Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agentsβ17Updated 4 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.β93Updated 4 months ago
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Modelsβ65Updated 8 months ago
- Survey: https://arxiv.org/pdf/2507.20198β179Updated last week
- [NeurIPS'25] HoliTom: Holistic Token Merging for Fast Video Large Language Modelsβ52Updated 2 weeks ago
- [ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMsβ41Updated 3 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuningβ122Updated 6 months ago
- [EMNLP 2025 main π₯] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"β84Updated 2 weeks ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"β42Updated 3 weeks ago
- Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"β50Updated 4 months ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuningβ69Updated last month
- [ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"β64Updated 3 weeks ago
- β27Updated 6 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editingβ113Updated last week
- Official implement of MIA-DPOβ66Updated 9 months ago