PatrickHua / Awesome-World-Models
This repository is a collection of research papers on World Models.
☆37Updated last year
Alternatives and similar repositories for Awesome-World-Models:
Users that are interested in Awesome-World-Models are comparing it to the libraries listed below
- A paper list of world model☆25Updated 8 months ago
- ☆69Updated 4 months ago
- ☆56Updated 4 months ago
- Code for paper "Grounding Video Models to Actions through Goal Conditioned Exploration".☆37Updated 3 weeks ago
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆90Updated 2 months ago
- GRAPE: Guided-Reinforced Vision-Language-Action Preference Optimization☆64Updated last month
- Code for "Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes"☆49Updated 9 months ago
- HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction☆22Updated 3 weeks ago
- Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning☆36Updated last week
- ☆41Updated 8 months ago
- ☆59Updated 2 months ago
- Code for FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks☆36Updated last month
- ☆43Updated 2 years ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆59Updated 3 months ago
- Code for Stable Control Representations☆23Updated 2 weeks ago
- LAPA: Latent Action Pretraining from Videos☆136Updated 3 weeks ago
- Codebase for HiP☆88Updated last year
- [CVPR 2023] Code for "3D Concept Learning and Reasoning from Multi-View Images"☆76Updated 11 months ago
- LogiCity@NeurIPS'24, D&B track. A multi-agent inductive learning environment for "abstractions".☆19Updated 2 months ago
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆55Updated last month
- Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"☆54Updated 2 months ago
- ☆49Updated 7 months ago
- Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World☆124Updated 2 months ago
- 📱👉🏠 Perform conditional procedural generation to generate houses like your own!☆34Updated last year
- Slot-TTA shows that test-time adaptation using slot-centric models can improve image segmentation on out-of-distribution examples.☆26Updated last year
- [ICCV 2023] Understanding 3D Object Interaction from a Single Image☆41Updated 10 months ago
- [NeurIPS 2024] CLOVER: Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation☆88Updated last month
- ☆43Updated 9 months ago
- ☆13Updated 6 months ago