facebookresearch/Multi-SpatialMLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/Multi-SpatialMLLM)

facebookresearch / Multi-SpatialMLLM

[CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

☆178

Alternatives and similar repositories for Multi-SpatialMLLM

Users that are interested in Multi-SpatialMLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

THU-SI / Spatial-MLLM
View on GitHub
[NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆479Feb 5, 2026Updated 5 months ago
InternRobotics / MMSI-Bench
View on GitHub
[ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
☆103Apr 28, 2026Updated 2 months ago
VITA-Group / VLM-3R
View on GitHub
[CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆428Updated this week
LaVi-Lab / VG-LLM
View on GitHub
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
☆246Nov 28, 2025Updated 7 months ago
Yangr116 / VST
View on GitHub
[ECCV2026] Visual Spatial Tuning
☆199Mar 25, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
LogosRoboticsGroup / SPAR
View on GitHub
From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptio…
☆90Jan 5, 2026Updated 6 months ago
liruilong940607 / prope
View on GitHub
Cameras as Relative Positional Encoding
☆740Dec 18, 2025Updated 7 months ago
InternRobotics / Aether
View on GitHub
[ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling
☆604Oct 26, 2025Updated 8 months ago
qizekun / OmniSpatial
View on GitHub
[ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
☆88Jan 21, 2026Updated 6 months ago
InternRobotics / OST-Bench
View on GitHub
[NeurIPS 2025] OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
☆79Sep 29, 2025Updated 9 months ago
haoningwu3639 / SpatialScore
View on GitHub
[CVPR 2026 Highlight] SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence
☆84May 28, 2026Updated last month
wzzheng / StreamVGGT
View on GitHub
[ICLR 2026] Streaming 4D Visual Geometry Transformer
☆942Oct 27, 2025Updated 8 months ago
fudan-zvg / UniUGG
View on GitHub
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding. Accepted to ICLR 2026.
☆63Updated this week
ML-GSAI / FlexWorld
View on GitHub
Official PyTorch implementation for "FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis".
☆132Sep 11, 2025Updated 10 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
OuyangKun10 / SpaceR
View on GitHub
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆111Jul 9, 2025Updated last year
KAIST-Visual-AI-Group / APC-VLM
View on GitHub
[ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
☆66Sep 12, 2025Updated 10 months ago
NJU-3DV / SpatialVID
View on GitHub
[CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
☆587Apr 22, 2026Updated 3 months ago
Visual-AI / 3DRS
View on GitHub
[NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
☆158Dec 9, 2025Updated 7 months ago
facebookresearch / univlg
View on GitHub
Unifying 2D and 3D Vision-Language Understanding
☆126Jul 2, 2026Updated 2 weeks ago
yyfz / Pi3
View on GitHub
[ICLR 2026] π^3: Permutation-Equivariant Visual Geometry Learning
☆2,080Jul 3, 2026Updated 2 weeks ago
yangzhou24 / OmniWorld
View on GitHub
[ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
☆485Apr 16, 2026Updated 3 months ago
ZCMax / LLaVA-3D
View on GitHub
[ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World
☆385Oct 21, 2025Updated 9 months ago
GAP-LAB-CUHK-SZ / stable-sim2real
View on GitHub
Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion (ICCV 2025 Highlight)
☆31Mar 15, 2026Updated 4 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
facebookresearch / DepthLM_Official
View on GitHub
[ICLR 2026 Oral (top 1.2%)] Official implementation of DepthLM
☆362Jun 1, 2026Updated last month
facebookresearch / 4DGT
View on GitHub
[NeurIPS 2025 (Spotlight)] The implementation for the paper "4DGT Learning a 4D Gaussian Transformer Using Real-World Monocular Videos"
☆467Sep 19, 2025Updated 10 months ago
CUT3R / CUT3R
View on GitHub
Official implementation of Continuous 3D Perception Model with Persistent State
☆1,465Aug 27, 2025Updated 10 months ago
chenguolin / MoVieS
View on GitHub
[CVPR 2026] Official implementation of "MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second".
☆460Mar 19, 2026Updated 4 months ago
AnjieCheng / SpatialRGPT
View on GitHub
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
☆335Dec 14, 2024Updated last year
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆732Aug 5, 2025Updated 11 months ago
manycore-research / SpatialLM
View on GitHub
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
☆4,620Jun 26, 2026Updated 3 weeks ago
ByteDance-Seed / SpatialTree
View on GitHub
CVPR 2026 (Highlight); Spatial Intelligence; MLLMs
☆48Feb 24, 2026Updated 4 months ago
InternRobotics / G2VLM
View on GitHub
[CVPR 2026] G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
☆345Apr 18, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Haochen-Wang409 / ross3d
View on GitHub
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
☆70Jul 22, 2025Updated last year
TencentARC / GeometryCrafter
View on GitHub
[ICCV 2025] GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
☆451Oct 2, 2025Updated 9 months ago
THU-SI / LangScene-X
View on GitHub
[ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
☆302Jul 15, 2025Updated last year
W-Ted / F3D-Gaus
View on GitHub
Official code for paper: F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Aggregative Gaussian Splatting
☆52Mar 11, 2025Updated last year
henry123-boy / SpaTrackerV2
View on GitHub
[ICCV 2025] SpatialTrackerV2: 3D Point Tracking Made Easy
☆984Feb 27, 2026Updated 4 months ago
BAAI-DCAI / SpatialBot
View on GitHub
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
☆349Sep 14, 2025Updated 10 months ago
SOTAMak1r / DeepVerse
View on GitHub
DeepVerse: 4D Autoregressive Video Generation as a World Model
☆231Aug 11, 2025Updated 11 months ago