zhishuifeiqian / VCR-Bench
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
β19Updated this week
Alternatives and similar repositories for VCR-Bench:
Users that are interested in VCR-Bench are comparing it to the libraries listed below
- β60Updated 3 weeks ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projectsβ18Updated last month
- Unifying Visual Understanding and Generation with Dual Visual Vocabularies πβ37Updated 3 weeks ago
- β30Updated 8 months ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?β53Updated 2 weeks ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Modelsβ30Updated 5 months ago
- Official implement of MIA-DPOβ55Updated 2 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMsβ47Updated last month
- β40Updated last week
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Modelsβ53Updated 9 months ago
- [LLaVA-Video-R1]β¨First Adaptation of R1 to LLaVA-Video (2025-03-18)β27Updated 3 weeks ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ89Updated last month
- β72Updated last week
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025β42Updated last month
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Modelsβ83Updated 7 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuningβ78Updated last week
- β21Updated 2 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)β28Updated 2 weeks ago
- VisRL: Intention-Driven Visual Perception via Reinforced Reasoningβ26Updated last month
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ72Updated last week
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ37Updated last month
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.β52Updated last month
- Envolving Temporal Reasoning Capability into LMMs via Temporal Consistent Rewardβ30Updated 3 weeks ago
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiencyβ93Updated 3 weeks ago
- β34Updated 9 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Modelsβ30Updated last week
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β94Updated 8 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedbackβ64Updated 7 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"β23Updated 3 months ago
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"β65Updated last month