InternRobotics / MesaTaskLinks
[NeurIPS 2025 Spotlight] MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning
☆41Updated 2 weeks ago
Alternatives and similar repositories for MesaTask
Users that are interested in MesaTask are comparing it to the libraries listed below
Sorting:
- CVPR 2025☆34Updated 6 months ago
- [Nips 2025] EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆119Updated 2 months ago
- [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning☆42Updated 10 months ago
- [NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆60Updated 2 weeks ago
- [NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.☆181Updated this week
- [ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models☆33Updated last year
- [ARXIV’25] Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control☆81Updated 3 months ago
- A list of works on video generation towards world model☆167Updated 2 months ago
- ☆37Updated last year
- Code implementation of the paper 'FIction: 4D Future Interaction Prediction from Video'☆15Updated 6 months ago
- [NeurIPS 24] The implementation and dataset of LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and…☆56Updated 6 months ago
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆149Updated 2 months ago
- Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆85Updated 2 months ago
- Official Reporsitory of "EgoMono4D: Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos"☆35Updated 3 weeks ago
- ☆21Updated 11 months ago
- [ICLR 2025 Spotlight] Grounding Video Models to Actions through Goal Conditioned Exploration☆56Updated 5 months ago
- Generative World Explorer☆157Updated 4 months ago
- Official Implementation of paper "Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence"☆135Updated 2 months ago
- [NeurIPS 2025] VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆80Updated this week
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆155Updated last week
- [CVPR-2025] GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding☆25Updated 2 months ago
- [NeurIPS 2024] Official code repository for MSR3D paper☆64Updated 2 months ago
- [ICML 2024] A Touch, Vision, and Language Dataset for Multimodal Alignment☆84Updated 4 months ago
- ☆90Updated 2 weeks ago
- [AAAI 2025] DreamPhysics: Learning Physics-Based 3D Dynamics with Video Diffusion Priors☆214Updated last year
- Self-reimplemented version of 4D-LRM.☆59Updated 4 months ago
- Official code for "Amodal Completion via Progressive Mixed Context Diffusion" [CVPR 2024 Highlight]☆51Updated last year
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆183Updated last week
- ☆50Updated 5 months ago
- ☆143Updated 9 months ago