xuyang-liu16 / VidCom2Links
π Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
β28Updated last month
Alternatives and similar repositories for VidCom2
Users that are interested in VidCom2 are comparing it to the libraries listed below
Sorting:
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ117Updated 5 months ago
- β93Updated 4 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understandingβ87Updated 3 months ago
- The official repository for ACL2025 paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".β51Updated 2 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Modelsβ64Updated last month
- [ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decayβ41Updated last month
- β54Updated 3 months ago
- π Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Modelsβ30Updated 2 weeks ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"β34Updated last month
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ45Updated last month
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025β64Updated 4 months ago
- [ICLR2025] Ξ³ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Modelsβ37Updated 5 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".β135Updated 2 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ103Updated 2 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Betterβ36Updated last month
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.β84Updated last month
- Official PyTorch Code of ReKV (ICLR'25)β36Updated 4 months ago
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editingβ79Updated 3 weeks ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasingββ52Updated last month
- Survey: https://arxiv.org/pdf/2507.20198β69Updated this week
- Official implement of MIA-DPOβ63Updated 6 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".β180Updated last month
- [NeurIPS 2024] Visual Perception by Large Language Modelβs Weightsβ45Updated 4 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoningβ89Updated 2 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videosβ66Updated 3 weeks ago
- β27Updated 4 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuningβ101Updated 4 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ108Updated last week
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understandingβ36Updated 4 months ago
- ICML2025β51Updated 2 months ago