Video-Reason / VMEvalKitLinks

This is a framework for evaluating reasoning in foundational Video Models.

☆46

Alternatives and similar repositories for VMEvalKit

Users that are interested in VMEvalKit are comparing it to the libraries listed below

Sorting:

thuml / MiniVeo3-Reasoner
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…
☆202Updated 3 months ago
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆192Updated 2 weeks ago
Video-as-Agent / VideoAgent
Official implementation of "Self-Improving Video Generation"
☆76Updated 8 months ago
Karine-Huang / GenMAC
[AAAI 2026] GenMAC for Compositional Text-to-Video Generation
☆30Updated this week
PzySeere / MetaSpatial
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …
☆198Updated 8 months ago
physical-superintelligence-lab / PhysBench
[ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …
☆83Updated 7 months ago
mll-lab-nu / MindCube
☆116Updated 2 months ago
tongjingqi / Thinking-with-Video
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…
☆234Updated this week
thuml / RLVR-World
Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934
☆181Updated 2 months ago
cambrian-mllm / cambrian-s
Cambrian-S: Towards Spatial Supersensing in Video
☆468Updated 2 weeks ago
mlpc-ucsd / OverLayBench
(NeurIPS 2025 D&B Track) OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
☆23Updated last month
Gabesarch / grounded-rl
☆113Updated 5 months ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆112Updated 2 months ago
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆226Updated 5 months ago
zhijie-group / R1-Zero-VSI
☆42Updated 7 months ago
phyworld / phyworld
☆161Updated last year
KlingTeam / PhysMaster
Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning
☆56Updated 2 months ago
IranQin / Awesome_World_Model_Papers
[World-Model-Survey-2024] Paper list and projects for World Model
☆15Updated last year
egolife-ai / Ego-R1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆134Updated 4 months ago
ChenVoid / CombatVLA
[ICCV 2025] CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
☆31Updated last month
weijiawu / Awesome-Visual-Reinforcement-Learning
📖 This is a repository for organizing papers, codes and other resources related to Visual Reinforcement Learning.
☆375Updated this week
M-E-AGI-Lab / Muddit
Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.
☆96Updated last week
yliu-cs / SSR
[NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
☆38Updated 2 months ago
InternRobotics / MMSI-Bench
[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
☆68Updated 2 weeks ago
UMass-Embodied-AGI / MindJourney
[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
☆121Updated 2 months ago
TencentARC / SEED-Bench-R1
☆96Updated 6 months ago
facebookresearch / Multi-SpatialMLLM
Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
☆164Updated 3 months ago
ML-GSAI / LLaDA-V
☆304Updated 3 weeks ago
ziqihuangg / Awesome-From-Video-Generation-to-World-Model
A list of works on video generation towards world model
☆313Updated last week
ThinkMorph / ThinkMorph
The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
☆136Updated 3 weeks ago