OpenGVLab/PVC

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenGVLab/PVC)

OpenGVLab / PVC

[CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

☆54

Alternatives and similar repositories for PVC

Users that are interested in PVC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ziqipang / MR-Video
View on GitHub
MR. Video: MapReduce is the Principle for Long Video Understanding
☆31Jun 18, 2026Updated last month
LaVi-Lab / AIM
View on GitHub
[ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"
☆65Oct 9, 2025Updated 9 months ago
xuyang-liu16 / GlobalCom2
View on GitHub
[AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆42Jan 27, 2026Updated 5 months ago
fansunqi / AKeyS
View on GitHub
Agentic Keyframe Search for Video Question Answering
☆18Jun 30, 2026Updated 2 weeks ago
SCZwangxiao / video-ReTaKe
View on GitHub
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆40Mar 16, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
daeunni / Video-Skill-CoT
View on GitHub
Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"
☆18Aug 27, 2025Updated 10 months ago
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year
daniel-cores / tvbench
View on GitHub
TVBench: Redesigning Video-Language Evaluation
☆15Jun 9, 2025Updated last year
Andy-Cheng / TEMPURA
View on GitHub
TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…
☆27Jun 4, 2025Updated last year
ywh187 / FitPrune
View on GitHub
☆68Jan 23, 2026Updated 5 months ago
marinero4972 / CyberV
View on GitHub
☆20Jun 10, 2025Updated last year
w-yibo / VTC-R1
View on GitHub
VTC-R1: Vision-Text Compression for Efficient Long-Context Reasoning.
☆26Feb 20, 2026Updated 5 months ago
fansunqi / VideoTool
View on GitHub
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
☆23May 18, 2026Updated 2 months ago
yellow-binary-tree / HawkEye
View on GitHub
Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos
☆47Apr 29, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
hasanar1f / HiRED
View on GitHub
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…
☆58Apr 18, 2025Updated last year
mingluzhao / Latent-Plan-Transformer
View on GitHub
Source code for "Latent Plan Transformer for Trajectory Abstraction: Planning as Latent Space Inference." In NeurIPS 2024
☆21Dec 1, 2024Updated last year
zai-org / MotionBench
View on GitHub
Official code for MotionBench (CVPR 2025)
☆76Mar 3, 2025Updated last year
vgbench / VGBench
View on GitHub
☆19Sep 19, 2024Updated last year
SparrowZheyuan18 / Awesome-Geolocalization
View on GitHub
A Paper List for Geo-localization Research
☆19Sep 2, 2024Updated last year
64327069 / LVAgent
View on GitHub
Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
☆39Nov 24, 2025Updated 7 months ago
open-compass / MMBench-GUI
View on GitHub
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…
☆112Sep 8, 2025Updated 10 months ago
jiyt17 / IDA-VLM
View on GitHub
[ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model
☆37Nov 27, 2024Updated last year
ChuanyangZheng / L2ViT
View on GitHub
Official PyTorch implementation of The Linear Attention Resurrection in Vision Transformer
☆15Sep 7, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
hyungjin-chung / VPS
View on GitHub
☆16Sep 11, 2025Updated 10 months ago
ethansmith2000 / ImprovedTokenMerge
View on GitHub
☆49Mar 3, 2024Updated 2 years ago
mlvlab / DeepVideoR1
View on GitHub
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆35Feb 22, 2026Updated 4 months ago
bigai-ai / QA-Synthesizer
View on GitHub
Adapt MLLMs to Domains via Post-Training (EMNLP 2025 Findings)
☆14Nov 11, 2025Updated 8 months ago
xinyouu / V-CAST
View on GitHub
V-CAST: Video Curvature-Aware Spatio-Temporal Pruning for Efficient Video Large Language Models
☆34Apr 16, 2026Updated 3 months ago
haowei-freesky / HERMES
View on GitHub
Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]
☆92May 8, 2026Updated 2 months ago
gls0425 / LinVT
View on GitHub
LinVT: Empower Your Image-level Large Language Model to Understand Videos
☆83Dec 30, 2024Updated last year
EvolvingLMMs-Lab / ParaVT
View on GitHub
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
☆54Jun 2, 2026Updated last month
OpenGVLab / De-focus-Attention-Networks
View on GitHub
Learning 1D Causal Visual Representation with De-focus Attention Networks
☆35Jun 7, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
NoakLiu / MT2ST
View on GitHub
Accelerating Multitask Training Trough Adaptive Transition [Efficient ML Model]
☆12May 23, 2025Updated last year
Theia-4869 / FasterVLM
View on GitHub
Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.
☆114Jun 29, 2025Updated last year
aiha-lab / InfiniPot-V
View on GitHub
[NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
☆20Jan 25, 2026Updated 5 months ago
MCG-NJU / CaReBench
View on GitHub
A Fine-grained Benchmark for Video Captioning and Retrieval
☆30Jul 16, 2025Updated last year
Victorwz / LLaVA-Unified
View on GitHub
☆23Aug 27, 2025Updated 10 months ago
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
Fsoft-AIC / Z-GMOT
View on GitHub
[NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Tracking
☆12May 19, 2026Updated 2 months ago