OpenGVLab/VRBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenGVLab/VRBench)

OpenGVLab / VRBench

[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos

☆28

Alternatives and similar repositories for VRBench

Users that are interested in VRBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenGVLab / perception_test_iccv2023
View on GitHub
Champion Solutions repository for Perception Test challenges in ICCV2023 workshop.
☆14Oct 18, 2023Updated 2 years ago
Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 4 months ago
64327069 / LVAgent
View on GitHub
Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
☆39Nov 24, 2025Updated 7 months ago
gyhdog99 / RACRO2
View on GitHub
Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)
☆19Jul 1, 2025Updated last year
wwfnb / Laser
View on GitHub
☆16Sep 16, 2025Updated 10 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Vinoground / Vinoground
View on GitHub
☆13Apr 13, 2026Updated 3 months ago
XenoZLH / Shuffle-R1
View on GitHub
Official code repository of Shuffle-R1
☆26Feb 23, 2026Updated 4 months ago
iLearn-Lab / MM23-RTQ
View on GitHub
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
☆15Apr 7, 2026Updated 3 months ago
lingorX / LIIR
View on GitHub
☆17Jun 21, 2022Updated 4 years ago
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Apr 18, 2026Updated 3 months ago
Codeeaner / Computer-Use-Agent
View on GitHub
An AI Agent that is able to control your screen to complste any task
☆22Oct 23, 2025Updated 8 months ago
KlingAIResearch / VANS
View on GitHub
[CVPR 2026] Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
☆119Feb 28, 2026Updated 4 months ago
mbzuai-oryx / VideoMolmo
View on GitHub
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
☆56Jul 5, 2025Updated last year
DYZhang09 / ViTWSS3D
View on GitHub
[ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection
☆13Apr 12, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
deep-spin / Infinite-Video
View on GitHub
\infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
☆21Feb 14, 2025Updated last year
chenllliang / MMEvalPro
View on GitHub
[NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs
☆25Sep 26, 2024Updated last year
wgcyeo / WorldMM
View on GitHub
[CVPR 2026 Highlight] WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
☆96Jun 18, 2026Updated last month
TIGER-AI-Lab / VideoEval-Pro
View on GitHub
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]
☆15Jun 1, 2026Updated last month
jylins / videoseek
View on GitHub
[CVPR 2026] VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
☆64Mar 23, 2026Updated 3 months ago
OpenGVLab / LORIS
View on GitHub
[ICML2023] Long-Term Rhythmic Video Soundtracker
☆63Jul 28, 2025Updated 11 months ago
OpenGVLab / TPO
View on GitHub
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆65Jul 22, 2025Updated 11 months ago
captcha-recognition / ocr_dataset
View on GitHub
常用的ocr数据集
☆16Nov 15, 2021Updated 4 years ago
fmu2 / snag_release
View on GitHub
Official Implementation of SnAG (CVPR 2024)
☆59Apr 26, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
TIGER-AI-Lab / VISTA
View on GitHub
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆20Feb 27, 2025Updated last year
Yan98 / GTA1
View on GitHub
☆130Oct 3, 2025Updated 9 months ago
Ranking-VMR / SPR
View on GitHub
☆13Jun 11, 2026Updated last month
look4u-ok / video-slicer
View on GitHub
☆18Jun 18, 2024Updated 2 years ago
DataArcTech / ChartBench
View on GitHub
☆16May 15, 2025Updated last year
fansunqi / VideoTool
View on GitHub
Official Repository for NeurIPS'25 Paper "Tool-Augmented Spatiotemporal Reasoning for Streamlining Video Question Answering Task"
☆23May 18, 2026Updated 2 months ago
CSU-JPG / MVPBench
View on GitHub
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT
☆15Jul 30, 2025Updated 11 months ago
MGitHubL / TMac
View on GitHub
☆14Feb 26, 2024Updated 2 years ago
path2generalist / General-Level
View on GitHub
On Path to Multimodal Generalist: General-Level and General-Bench
☆21Jul 11, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
yujiangpu20 / cma_xdVioDet
View on GitHub
Official code for "Audio-Guided Attention Network for Weakly Supervised Violence Detection" (ICCECE2022).
☆13Mar 25, 2022Updated 4 years ago
OpenGVLab / VKnowU
View on GitHub
[ECCV 2026] VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs
☆15Feb 3, 2026Updated 5 months ago
PolyU-ChenLab / ETBench
View on GitHub
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆74Jan 20, 2025Updated last year
JustinYuu / MM_Pyramid
View on GitHub
[ACM MM 2022] MM_Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing
☆15Aug 26, 2022Updated 3 years ago
shjo-april / TRACE
View on GitHub
[ICLR 2026 Oral] TRACE: Your Diffusion Model Is Secretly an Instance Edge Detector
☆16Mar 2, 2026Updated 4 months ago
CSU-JPG / V-MAGE
View on GitHub
[ACL '26 Findings] V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in MLLMs
☆27Apr 28, 2026Updated 2 months ago
naver-ai / muco
View on GitHub
Official Pytorch implementation of MuCo: Multi-turn Contrastive Learning for Multimodal Embedding Model (CVPR 2026)
☆15Apr 16, 2026Updated 3 months ago