QinengWang-Aiden / Awesome-embodied-world-model-papers
A paper list that includes world models or generative video models for embodied agents.
☆23Updated 4 months ago
Alternatives and similar repositories for Awesome-embodied-world-model-papers
Users that are interested in Awesome-embodied-world-model-papers are comparing it to the libraries listed below
Sorting:
- ☆35Updated last month
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆103Updated 6 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆59Updated 2 months ago
- Program synthesis for 3D spatial reasoning☆31Updated 2 months ago
- [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning☆37Updated 5 months ago
- A list of works on video generation towards world model☆58Updated last week
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆99Updated 3 weeks ago
- [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs☆38Updated 11 months ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"☆36Updated last year
- [CVPR 2023] Code for "3D Concept Learning and Reasoning from Multi-View Images"☆78Updated last year
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆20Updated 5 months ago
- Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"☆54Updated last year
- [ICLR 2025 Spotlight] Grounding Video Models to Actions through Goal Conditioned Exploration☆48Updated 2 weeks ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆73Updated 2 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆29Updated last week
- ☆14Updated last month
- [ICLR 2024] Official implementation of the paper "Toss: High-quality text-guided novel view synthesis from a single image"☆22Updated last year
- [arXiv 2024] The official repository of the paper "Unsupervised Discovery of Object-Centric Neural Fields"☆17Updated 3 months ago
- Official Reporsitory of "EgoMono4D: Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos"☆21Updated last month
- ☆15Updated last week
- Agent-to-Sim Learning Interactive Behavior from Casual Videos.☆43Updated 7 months ago
- Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization☆20Updated last month
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆58Updated last month
- Source code for my homepage.☆12Updated 3 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆20Updated 9 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆117Updated last week
- The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"☆51Updated this week
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆27Updated 7 months ago
- ☆126Updated 4 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆67Updated 2 months ago