IranQin / Awesome_World_Model_PapersLinks

[World-Model-Survey-2024] Paper list and projects for World Model

☆15

Alternatives and similar repositories for Awesome_World_Model_Papers

Users that are interested in Awesome_World_Model_Papers are comparing it to the libraries listed below

Sorting:

IranQin / MP5
[CVPR2024] This is the official implement of MP5
☆106Updated last year
TencentARC / Moto
[ICCV2025 Oral] Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
☆148Updated last month
thuml / iVideoGPT
Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223
☆157Updated 2 months ago
Little-Podi / AdaWorld
[ICML'25] The PyTorch implementation of paper: "AdaWorld: Learning Adaptable World Models with Latent Actions".
☆178Updated 5 months ago
Video-as-Agent / VideoAgent
Official implementation of "Self-Improving Video Generation"
☆75Updated 7 months ago
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆77Updated 4 months ago
PzySeere / MetaSpatial
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …
☆193Updated 6 months ago
baaivision / UniVLA
Unified Vision-Language-Action Model
☆233Updated last month
InternRobotics / OST-Bench
[NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
☆67Updated 2 months ago
Karine-Huang / GenMAC
☆30Updated 11 months ago
aim-uofa / Omni-R1
[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
☆92Updated 5 months ago
Gabesarch / grounded-rl
☆104Updated 4 months ago
M-E-AGI-Lab / Awesome-World-Models
Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.
☆55Updated last month
TencentARC / SEED-Bench-R1
☆94Updated 5 months ago
zai-org / MotionBench
Official code for MotionBench (CVPR 2025)
☆59Updated 8 months ago
EmbodiedBench / EmbodiedBench
[ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as embodied agents.
☆218Updated last month
mll-lab-nu / MindCube
☆100Updated 3 weeks ago
cambrian-mllm / cambrian-s
Cambrian-S: Towards Spatial Supersensing in Video
☆375Updated 2 weeks ago
thuml / MiniVeo3-Reasoner
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…
☆185Updated last month
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆194Updated 3 months ago
JeffWang987 / EgoVid
[Nips 2025] EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
☆122Updated 4 months ago
zhijie-group / R1-Zero-VSI
☆41Updated 5 months ago
yliu-cs / SSR
[NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
☆31Updated last month
OpenHelix-Team / VLA-RFT
VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning
☆86Updated last month
OpenGVLab / VeBrain
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
☆86Updated 5 months ago
Zhoues / RoboRefer
[NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"
☆202Updated last month
vision-x-nyu / pisa-experiments
Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)
☆46Updated 6 months ago
MINT-SJTU / STI-Bench
STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
☆33Updated 4 months ago
zhang9302002 / ThinkingWithVideos
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆64Updated last month
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆170Updated 2 weeks ago