mll-lab-nu/Awesome-Spatial-Intelligence-in-VLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mll-lab-nu/Awesome-Spatial-Intelligence-in-VLM)

mll-lab-nu / Awesome-Spatial-Intelligence-in-VLM

A paper list for spatial reasoning

☆767

Alternatives and similar repositories for Awesome-Spatial-Intelligence-in-VLM

Users that are interested in Awesome-Spatial-Intelligence-in-VLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

THU-SI / Spatial-MLLM
View on GitHub
[NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆480Feb 5, 2026Updated 5 months ago
VITA-Group / VLM-3R
View on GitHub
[CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆431Jul 15, 2026Updated 2 weeks ago
mll-lab-nu / MindCube
View on GitHub
☆164Mar 23, 2026Updated 4 months ago
cambrian-mllm / cambrian-s
View on GitHub
Cambrian-S: Towards Spatial Supersensing in Video
☆563Apr 3, 2026Updated 3 months ago
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆734Aug 5, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
LaVi-Lab / VG-LLM
View on GitHub
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
☆248Nov 28, 2025Updated 8 months ago
OpenSenseNova / SenseNova-SI
View on GitHub
[CVPR 2026] Scaling Spatial Intelligence with Multimodal Foundation Models
☆293May 14, 2026Updated 2 months ago
zhengxuJosh / Awesome-Multimodal-Spatial-Reasoning
View on GitHub
This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).
☆319Feb 17, 2026Updated 5 months ago
yukangcao / Awesome-4D-Spatial-Intelligence
View on GitHub
A curated list of awesome papers for reconstructing 4D spatial intelligence from video. (arXiv 2507.21045)
☆515Jun 5, 2026Updated last month
zhangquanchen / 3DThinker
View on GitHub
[CVPR 2026] Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views
☆244May 7, 2026Updated 2 months ago
UMass-Embodied-AGI / MindJourney
View on GitHub
[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
☆151Nov 4, 2025Updated 8 months ago
knightnemo / Awesome-World-Models
View on GitHub
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts…
☆3,244Updated this week
InternRobotics / G2VLM
View on GitHub
[CVPR 2026] G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
☆347Apr 18, 2026Updated 3 months ago
Yangr116 / VST
View on GitHub
[ECCV2026] Visual Spatial Tuning
☆200Mar 25, 2026Updated 4 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
InternRobotics / MMSI-Bench
View on GitHub
[ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
☆106Apr 28, 2026Updated 3 months ago
OuyangKun10 / SpaceR
View on GitHub
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆111Jul 9, 2025Updated last year
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,499Mar 9, 2026Updated 4 months ago
gca-spatial-reasoning / gca
View on GitHub
Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"
☆91Apr 7, 2026Updated 3 months ago
InternLM / Spatial-SSRL
View on GitHub
[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
☆133Apr 7, 2026Updated 3 months ago
Zhoues / RoboRefer
View on GitHub
[NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"
☆265Dec 16, 2025Updated 7 months ago
EvolvingLMMs-Lab / EASI
View on GitHub
Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
☆119Jul 1, 2026Updated 3 weeks ago
LaVi-Lab / Video-3D-LLM
View on GitHub
[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.
☆220Jun 4, 2025Updated last year
NJU-3DV / SpatialVID
View on GitHub
[CVPR 2026] SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
☆589Apr 22, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
cambrian-mllm / cambrian-p
View on GitHub
Cambrian-P: Pose-Grounded Video Understanding
☆102Updated this week
THU-SI / Spatial-TTT
View on GitHub
[ECCV 2026] Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training
☆245Jun 19, 2026Updated last month
facebookresearch / DepthLM_Official
View on GitHub
[ICLR 2026 Oral (top 1.2%)] Official implementation of DepthLM
☆363Jun 1, 2026Updated last month
LogosRoboticsGroup / SPAR
View on GitHub
From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptio…
☆90Jan 5, 2026Updated 6 months ago
ActiveVisionLab / Awesome-LLM-3D
View on GitHub
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
☆2,242Apr 16, 2026Updated 3 months ago
ZJU-REAL / SpatialLadder
View on GitHub
[ICLR 2026] SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
☆99Jun 9, 2026Updated last month
AnjieCheng / SpatialRGPT
View on GitHub
[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
☆336Dec 14, 2024Updated last year
Visionary-Laboratory / holi-spatial
View on GitHub
[ICML 2026 Oral] Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence
☆372Updated this week
yyfz / Pi3
View on GitHub
[ICLR 2026] π^3: Permutation-Equivariant Visual Geometry Learning
☆2,094Jul 3, 2026Updated 3 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mll-lab-nu / Theory-of-Space
View on GitHub
THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…
☆85Feb 27, 2026Updated 5 months ago
SIBench / Awesome-Visual-Spatial-Reasoning
View on GitHub
This is a project about visual spatial reasoning.
☆144Updated this week
ShijieZhou-UCLA / VLM4D
View on GitHub
[ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
☆55Nov 20, 2025Updated 8 months ago
AIGeeksGroup / 3D-R1
View on GitHub
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
☆414Jul 20, 2026Updated last week
arijitray1993 / awesome-spatial-reasoning
View on GitHub
Collection of the latest spatial, 3D, and video/temporal reasoning papers
☆36Sep 29, 2025Updated 10 months ago
ZCMax / LLaVA-3D
View on GitHub
[ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World
☆387Oct 21, 2025Updated 9 months ago
liudaizong / Awesome-3D-Visual-Grounding
View on GitHub
😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.
☆283Jan 14, 2026Updated 6 months ago