llyx97 / video_reason_benchLinks
A benchmark for evaluating vision-centric, complex video reasoning.
☆28Updated last month
Alternatives and similar repositories for video_reason_bench
Users that are interested in video_reason_bench are comparing it to the libraries listed below
Sorting:
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆44Updated last month
- ☆89Updated 3 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆50Updated 3 weeks ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆103Updated 11 months ago
- Official repository of MMDU dataset☆92Updated 9 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆78Updated last month
- [ICCV 2025] LVBench: An Extreme Long Video Understanding Benchmark☆95Updated last week
- Official implement of MIA-DPO☆59Updated 5 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆60Updated this week
- ☆50Updated 3 weeks ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆59Updated 4 months ago
- ☆25Updated 5 months ago
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆128Updated 8 months ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆75Updated 3 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆101Updated last month
- MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency☆117Updated 3 weeks ago
- The Next Step Forward in Multimodal LLM Alignment☆170Updated 2 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆67Updated 10 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆118Updated 3 months ago
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆29Updated 2 months ago
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆68Updated 4 months ago
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".☆29Updated last week
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆185Updated 9 months ago
- R1-like Video-LLM for Temporal Grounding☆109Updated 3 weeks ago
- Official implementation of the paper: RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction☆14Updated last month
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆48Updated 4 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆69Updated last year
- ☆87Updated 3 weeks ago
- ☆152Updated 8 months ago
- ☆137Updated 9 months ago