xuyang-liu16 / VidCom2Links
π Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
β34Updated last month
Alternatives and similar repositories for VidCom2
Users that are interested in VidCom2 are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understandingβ108Updated last month
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ79Updated last week
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ130Updated 7 months ago
- β122Updated 6 months ago
- π Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Modelsβ33Updated 2 months ago
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Modelsβ54Updated 4 months ago
- β58Updated 5 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.β92Updated 3 months ago
- Survey: https://arxiv.org/pdf/2507.20198β157Updated last month
- [NeurIPS'25] HoliTom: Holistic Token Merging for Fast Video Large Language Modelsβ47Updated 2 weeks ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".β163Updated 4 months ago
- Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"β46Updated 3 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ218Updated last month
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Modelsβ51Updated 7 months ago
- [ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decayβ43Updated 3 months ago
- Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"β68Updated last month
- [ICLR2025] Ξ³ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Modelsβ39Updated 7 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ47Updated 3 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Modelβ35Updated 9 months ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsificationβ37Updated 6 months ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.β71Updated 8 months ago
- [NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editingβ99Updated 3 weeks ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"β39Updated 3 months ago
- [NeurIPS 2025] Official code for paper: Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs.β60Updated 2 weeks ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025β76Updated 6 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasingββ57Updated 3 months ago
- [EMNLP 2025 main] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"β74Updated last month
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuningβ67Updated 3 weeks ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videosβ81Updated 3 weeks ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Modelsβ149Updated last week