diankun-wu/Spatial-MLLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/diankun-wu/Spatial-MLLM)

diankun-wu / Spatial-MLLM

Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence

☆438

Alternatives and similar repositories for Spatial-MLLM

Users that are interested in Spatial-MLLM are comparing it to the libraries listed below

Sorting:

VITA-Group / VLM-3R
View on GitHub
[CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆345Feb 24, 2026Updated last week
LaVi-Lab / VG-LLM
View on GitHub
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
☆204Nov 28, 2025Updated 3 months ago
liuff19 / LangScene-X
View on GitHub
[ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
☆296Jul 15, 2025Updated 7 months ago
facebookresearch / Multi-SpatialMLLM
View on GitHub
[CVPR 2026] Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models
☆170Feb 25, 2026Updated last week
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆675Aug 5, 2025Updated 6 months ago
YkiWu / Point3R
View on GitHub
[NeurIPS 2025] Streaming 3D Reconstruction with Explicit Spatial Pointer Memory
☆179Sep 26, 2025Updated 5 months ago
SunYangtian / UniGeo
View on GitHub
UniGeo: Taming Video Diffusion for Unified Consistent Geometry Estimation
☆135Jun 10, 2025Updated 8 months ago
UMass-Embodied-AGI / TesserAct
View on GitHub
ICCV 2025 | TesserAct: Learning 4D Embodied World Models
☆380Aug 4, 2025Updated 7 months ago
OuyangKun10 / SpaceR
View on GitHub
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆104Jul 9, 2025Updated 7 months ago
LaVi-Lab / Video-3D-LLM
View on GitHub
[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.
☆200Jun 4, 2025Updated 9 months ago
yangzhou24 / OmniWorld
View on GitHub
[ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
☆433Feb 25, 2026Updated last week
Haochen-Wang409 / ross3d
View on GitHub
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
☆67Jul 22, 2025Updated 7 months ago
haoningwu3639 / SpatialScore
View on GitHub
[CVPR 2026] SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence
☆63Jul 9, 2025Updated 7 months ago
sled-group / COMFORT
View on GitHub
[ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…
☆21Oct 24, 2024Updated last year
wzzheng / StreamVGGT
View on GitHub
[ICLR 2026] Streaming 4D Visual Geometry Transformer
☆832Oct 27, 2025Updated 4 months ago
hanyang-21 / VideoScene
View on GitHub
[CVPR 2025 Highlight] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
☆346Jul 4, 2025Updated 8 months ago
fudan-zvg / UniUGG
View on GitHub
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding
☆60Aug 19, 2025Updated 6 months ago
InternRobotics / EmbodiedScan
View on GitHub
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
☆652Jun 13, 2025Updated 8 months ago
mll-lab-nu / Awesome-Spatial-Intelligence-in-VLM
View on GitHub
A paper list for spatial reasoning
☆661Jan 19, 2026Updated last month
CUT3R / CUT3R
View on GitHub
Official implementation of Continuous 3D Perception Model with Persistent State
☆1,345Aug 27, 2025Updated 6 months ago
AIGeeksGroup / 3D-R1
View on GitHub
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
☆398Dec 22, 2025Updated 2 months ago
ActiveVisionLab / Awesome-LLM-3D
View on GitHub
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
☆2,120Feb 3, 2026Updated last month
jzr99 / Geo4D
View on GitHub
[ICCV 2025 Highlight] Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
☆412Jun 6, 2025Updated 8 months ago
facebookresearch / DepthLM_Official
View on GitHub
[ICLR 2026 Oral (top 1.2%)] Official implementation of DepthLM
☆312Updated this week
ant-research / FLARE
View on GitHub
☆703May 1, 2025Updated 10 months ago
MaureenZOU / m3-spatial
View on GitHub
[ICLR 2025] Official Implementation of M3: 3D-Spatial Multimodal Memory
☆198Apr 26, 2025Updated 10 months ago
AnjieCheng / SpatialRGPT
View on GitHub
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
☆313Dec 14, 2024Updated last year
yyfz / Pi3
View on GitHub
[ICLR 2026] π^3: Permutation-Equivariant Visual Geometry Learning
☆1,673Updated this week
manycore-research / SpatialLM
View on GitHub
[NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modeling
☆4,245Sep 26, 2025Updated 5 months ago
runjiali-rl / vmem
View on GitHub
[ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
☆417Jul 25, 2025Updated 7 months ago
SpatialVision / Orient-Anything
View on GitHub
Orient Anything, ICML 2025
☆374Feb 6, 2026Updated 3 weeks ago
ZCMax / LLaVA-3D
View on GitHub
[ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World
☆373Oct 21, 2025Updated 4 months ago
facebookresearch / fast3r
View on GitHub
[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
☆1,510May 7, 2025Updated 9 months ago
NVlabs / LSM
View on GitHub
[NeurIPS'24] Large Spatial Model: End-to-end Unposed Images to Semantic 3D
☆227Feb 11, 2026Updated 3 weeks ago
hujiecpp / PE3R
View on GitHub
PE3R: Perception-Efficient 3D Reconstruction. Take 2 - 3 photos with your phone, upload them, wait a few minutes, and then start explorin…
☆399Apr 1, 2025Updated 11 months ago
Visual-AI / 3DRS
View on GitHub
[NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
☆149Dec 9, 2025Updated 2 months ago
facebookresearch / locate-3d
View on GitHub
Open source repo for Locate 3D Model, 3D-JEPA and Locate 3D Dataset
☆412Jun 3, 2025Updated 9 months ago
QitaoZhao / DiffusionSfM
View on GitHub
[CVPR 2025] "DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion" official implementation.
☆182Jul 7, 2025Updated 7 months ago
Yangr116 / VST
View on GitHub
Visual Spatial Tuning
☆176Feb 19, 2026Updated 2 weeks ago