facebookresearch / Multi-SpatialMLLMLinks
Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
☆123Updated last month
Alternatives and similar repositories for Multi-SpatialMLLM
Users that are interested in Multi-SpatialMLLM are comparing it to the libraries listed below
Sorting:
- VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction☆167Updated 2 weeks ago
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆110Updated this week
- ☆47Updated last month
- EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation☆108Updated 7 months ago
- Unifying 2D and 3D Vision-Language Understanding☆86Updated 2 months ago
- Code for "BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation", arXiv 2025.☆62Updated 2 months ago
- A list of works on video generation towards world model☆151Updated this week
- [CVPR 2024] Situational Awareness Matters in 3D Vision Language Reasoning☆38Updated 6 months ago
- [ICLR 2025] Where Am I and What Will I See : An Auto-Regressive Model for Spatial Localization and View Prediction☆36Updated 4 months ago
- [ARXIV’25] Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control☆64Updated 3 weeks ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆235Updated this week
- [CVPR 2025 Highlight] Towards Autonomous Micromobility through Scalable Urban Simulation☆49Updated last week
- WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes☆92Updated 3 months ago
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆109Updated 3 months ago
- [ICLR 2025] MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow☆22Updated 2 months ago
- [ICLR 2025] Dataset and Code for Paper "Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels"☆40Updated 2 months ago
- Official Reporsitory of "EgoMono4D: Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos"☆24Updated 2 months ago
- Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance☆34Updated 3 weeks ago
- Seeing World Dynamics in a Nutshell☆109Updated 3 months ago
- open-sourced video dataset with dynamic scenes and camera movements annotation☆61Updated 2 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆63Updated 2 weeks ago
- [ICLR 2025] Official Implementation of M3: 3D-Spatial Multimodal Memory☆164Updated 2 months ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆110Updated 2 weeks ago
- StreamSplat: Towards Online Dynamic 3D Reconstruction from Uncalibrated Video Streams☆44Updated 2 weeks ago
- OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆40Updated 3 weeks ago
- [CVPR 2025] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video☆153Updated last month
- [ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.☆60Updated 8 months ago
- ☆70Updated last week
- Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation☆75Updated 3 weeks ago
- Self-reimplemented version of 4D-LRM.☆30Updated 3 weeks ago