QinengWang-Aiden / Awesome-embodied-world-model-papersLinks

A paper list that includes world models or generative video models for embodied agents.

☆25

Alternatives and similar repositories for Awesome-embodied-world-model-papers

Users that are interested in Awesome-embodied-world-model-papers are comparing it to the libraries listed below

Sorting:

UMass-Embodied-AGI / MindJourney
[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
☆93Updated 2 weeks ago
SimWorld-AI / SimWorld
SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds
☆74Updated last week
mll-lab-nu / MindCube
☆100Updated 3 weeks ago
YunzeMan / Situation3D
[CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning
☆42Updated 11 months ago
USC-GVL / PhysBench
[ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …
☆75Updated 5 months ago
facebookresearch / Multi-SpatialMLLM
Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
☆158Updated last month
JeffWang987 / EgoVid
[Nips 2025] EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation
☆122Updated 3 months ago
KlingTeam / PhysMaster
Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning
☆52Updated last month
sled-group / 3D-GRAND
[CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs
☆51Updated last year
evelinehong / 3D-CLR-Official
[CVPR 2023] Code for "3D Concept Learning and Reasoning from Multi-View Images"
☆82Updated last year
richard-guyunqi / BlenderGym-Open
☆19Updated 4 months ago
OpenHelix-Team / VLA-RFT
VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning
☆81Updated last month
LogosRoboticsGroup / SPAR
From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptio…
☆63Updated last month
InternRobotics / InternScenes
[NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.
☆195Updated last month
phyworld / phyworld
☆150Updated 10 months ago
video-to-action / video-to-action-release
[ICLR 2025 Spotlight] Grounding Video Models to Actions through Goal Conditioned Exploration
☆58Updated 6 months ago
haoyi-duan / WorldScore
Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation
☆163Updated 3 months ago
michaelyuancb / egomono4d
Official Reporsitory of "EgoMono4D: Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos"
☆38Updated last month
Red-Fairy / uOCF-code
[TMLR 2025] The official repository of the paper "Unsupervised Discovery of Object-Centric Neural Fields"
☆17Updated 9 months ago
slowfast-vgen / slowfast-vgen
☆21Updated last year
wz0919 / EPiC
Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance
☆45Updated 5 months ago
ziqihuangg / Awesome-From-Video-Generation-to-World-Model
A list of works on video generation towards world model
☆210Updated last week
OpenM3D / M3DBench
[ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.
☆61Updated last year
World-In-World / world-in-world
Code implementation of the paper "World-in-World: World Models in a Closed-Loop World"
☆90Updated last week
video-language-planning / vlp_code
☆77Updated 5 months ago
zhaowei-wang-nlp / DivScene
The code of the paper "DivScene: Benchmarking LVLMs for Object Navigation with Diverse Scenes and Objects"
☆18Updated 6 months ago
MSR3D / MSR3D
[NeurIPS 2024] Official code repository for MSR3D paper
☆68Updated 3 months ago
facebookresearch / univlg
Unifying 2D and 3D Vision-Language Understanding
☆116Updated 3 months ago
neu-vi / FleVRS
FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024
☆22Updated 11 months ago
OpenGVLab / VeBrain
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
☆85Updated 5 months ago