Video-Reason / VBVR-EvalKitLinks
This is a framework for evaluating reasoning in foundational Video Models.
☆49Updated last week
Alternatives and similar repositories for VBVR-EvalKit
Users that are interested in VBVR-EvalKit are comparing it to the libraries listed below
Sorting:
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆206Updated 3 months ago
- [AAAI 2026] GenMAC for Compositional Text-to-Video Generation☆31Updated 3 weeks ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆236Updated 3 weeks ago
- ☆118Updated 3 months ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆204Updated 3 months ago
- Official implementation of "Self-Improving Video Generation"☆78Updated 9 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated 3 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆203Updated 8 months ago
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆205Updated this week
- (NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps☆23Updated last week
- We introduce BabyVision, a benchmark revealing the infancy of AI vision.☆162Updated 2 weeks ago
- Cambrian-S: Towards Spatial Supersensing in Video☆482Updated last month
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆74Updated last week
- [World-Model-Survey-2024] Paper list and projects for World Model☆15Updated last year
- [ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆77Updated this week
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆77Updated 2 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆48Updated this week
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Updated 6 months ago
- ☆116Updated 6 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆113Updated last month
- ☆41Updated 7 months ago
- ☆311Updated last month
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning☆58Updated 3 months ago
- ☆162Updated last year
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆83Updated last week
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…☆351Updated 3 weeks ago
- Official repository for "Vid2World: Crafting Video Diffusion Models to Interactive World Models" (ICLR 2026), https://arxiv.org/abs/2505.…☆34Updated this week
- A Large-scale Video Action Dataset☆376Updated 2 weeks ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆148Updated last year
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆87Updated 7 months ago