xuyang-liu16 / VidCom2Links
π Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
β24Updated last month
Alternatives and similar repositories for VidCom2
Users that are interested in VidCom2 are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ43Updated last month
- LEO: A powerful Hybrid Multimodal LLMβ18Updated 5 months ago
- Official implementation of MC-LLaVA.β31Updated last month
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understandingβ80Updated 2 months ago
- β88Updated 3 months ago
- [ICLR2025] Ξ³ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Modelsβ37Updated 5 months ago
- The official repository for ACL2025 paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".β49Updated 2 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visioβ¦β39Updated 2 months ago
- β22Updated 4 months ago
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.β82Updated 2 weeks ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoningβ83Updated last month
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ101Updated last month
- Official PyTorch Code of ReKV (ICLR'25)β33Updated 4 months ago
- [ICCV 2025] Dynamic-VLMβ21Updated 7 months ago
- β53Updated 2 months ago
- π Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Modelsβ30Updated last month
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ115Updated 4 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ58Updated 2 weeks ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsificationβ34Updated 3 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoningβ16Updated 2 months ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"β29Updated 3 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025β58Updated 4 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Betterβ31Updated last month
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understandingβ34Updated 4 months ago
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"β22Updated 2 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Modelβ30Updated 6 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β60Updated this week
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understandingβ53Updated last week
- β86Updated 3 weeks ago
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-trainingβ48Updated 3 months ago