[NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆464Feb 5, 2026Updated 3 months ago
Alternatives and similar repositories for Spatial-MLLM
Users that are interested in Spatial-MLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction☆397Apr 23, 2026Updated 3 weeks ago
- The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'☆234Nov 28, 2025Updated 5 months ago
- [CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆177Feb 25, 2026Updated 2 months ago
- [ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion☆304Jul 15, 2025Updated 10 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆112Jul 9, 2025Updated 10 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Official repo and evaluation implementation of VSI-Bench☆708Aug 5, 2025Updated 9 months ago
- UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation☆136Jun 10, 2025Updated 11 months ago
- [NeurIPS 2025] Streaming 3D Reconstruction with Explicit Spatial Pointer Memory☆186Mar 10, 2026Updated 2 months ago
- ICCV 2025 | TesserAct: Learning 4D Embodied World Models☆392Aug 4, 2025Updated 9 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆69Jul 22, 2025Updated 9 months ago
- [CVPR 2026 Spotlight] SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence☆71Apr 17, 2026Updated last month
- [CVPR 2025 Highlight] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step☆352Jul 4, 2025Updated 10 months ago
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆467Apr 16, 2026Updated last month
- [ICLR 2026] Streaming 4D Visual Geometry Transformer☆911Oct 27, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…☆21Oct 24, 2024Updated last year
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆213Jun 4, 2025Updated 11 months ago
- UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding. Accepted to ICLR 2026.☆62Aug 19, 2025Updated 9 months ago
- [ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory☆428Jul 25, 2025Updated 9 months ago
- A paper list for spatial reasoning