UMass-Embodied-AGI / MindJourneyLinks
Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
☆86Updated 2 months ago
Alternatives and similar repositories for MindJourney
Users that are interested in MindJourney are comparing it to the libraries listed below
Sorting:
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆154Updated 4 months ago
- [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning☆41Updated 10 months ago
- Unifying 2D and 3D Vision-Language Understanding☆108Updated 2 months ago
- [NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.☆176Updated 3 weeks ago
- Official Reporsitory of "EgoMono4D: Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos"☆34Updated 2 weeks ago
- ☆90Updated last week
- Geometry-aware 4D Video Generation for Robot Manipulation☆60Updated last month
- [NeurIPS 24] The implementation and dataset of LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and…☆56Updated 6 months ago
- [NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆59Updated last week
- [ICLR 2025 Spotlight] Grounding Video Models to Actions through Goal Conditioned Exploration☆55Updated 5 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆72Updated 4 months ago
- VLA-RFT: Vision-Language-Action Models with Reinforcement Fine-Tuning☆45Updated this week
- [Nips 2025] EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆118Updated 2 months ago
- [CVPR 2025 highlight] Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision☆29Updated 2 weeks ago
- [ARXIV’25] Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control☆81Updated 3 months ago
- [CVPR 2025 Highlight] Towards Autonomous Micromobility through Scalable Urban Simulation☆116Updated 2 months ago
- ☆51Updated last year
- [CVPR 2025] 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs☆46Updated last year
- The offical repo for paper "VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers" (ICCV 2025)☆81Updated last month
- Generative World Explorer☆156Updated 3 months ago
- IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos☆49Updated 6 months ago
- [NeurIPS 2024] Official code repository for MSR3D paper☆64Updated 2 months ago
- Implementation of Prompting with the Future: Open-World Model Predictive Control with Interactive Digital Twins. (RSS 2025))☆40Updated last month
- Sim-to-real and CDM inference code for ManipAsInSim project.☆89Updated 2 weeks ago
- [ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation☆166Updated 3 months ago
- ☆18Updated last year
- ☆34Updated 5 months ago
- [ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction☆39Updated 2 months ago
- Code for "BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation", ICCV 2025.☆85Updated this week
- SceneFun3D ToolKit☆157Updated 5 months ago