zhengxuJosh/Awesome-Multimodal-Spatial-Reasoning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhengxuJosh/Awesome-Multimodal-Spatial-Reasoning)

zhengxuJosh / Awesome-Multimodal-Spatial-Reasoning

This repository collects and organises state‑of‑the‑art papers on spatial reasoning for Multimodal Vision–Language Models (MVLMs).

☆319

Alternatives and similar repositories for Awesome-Multimodal-Spatial-Reasoning

Users that are interested in Awesome-Multimodal-Spatial-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhengxuJosh / SAM4SS
View on GitHub
SAM4SS: Tailoring SAM and SAM2 for Semantic Segmentation
☆11Jul 31, 2024Updated last year
zhengxuJosh / AnySeg
View on GitHub
Code & Weights for “Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation”
☆15Dec 6, 2024Updated last year
mll-lab-nu / Awesome-Spatial-Intelligence-in-VLM
View on GitHub
A paper list for spatial reasoning
☆767Jan 19, 2026Updated 6 months ago
EnVision-Research / PAP
View on GitHub
Panoramic Affordance Prediction (PAP) (ECCV 2026)
☆46Jun 29, 2026Updated last month
Chenfei-Liao / Multi-Modal-Semantic-Segmentation-Robustness-Benchmark
View on GitHub
[CVPR Workshop Best Paper Award] Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustn…
☆19Nov 4, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Chenfei-Liao / MemorySAM
View on GitHub
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation
☆45Nov 4, 2025Updated 8 months ago
OpenGVLab / NaViL
View on GitHub
☆95Oct 10, 2025Updated 9 months ago
LJungang / Awesome-Video-Reasoning-Landscape
View on GitHub
🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.
☆190Jun 14, 2026Updated last month
Yui010206 / Adaptive-Visual-Imagination-Control
View on GitHub
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
☆18Jun 2, 2026Updated last month
Chenfei-Liao / VTC-Bench
View on GitHub
[ACL2026 Main] Data & Code of "Are We Using the Right Benchmark: An Evaluation Framework for Visual Token Compression Methods"
☆35Apr 9, 2026Updated 3 months ago
InternLM / Spatial-SSRL
View on GitHub
[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
☆133Apr 7, 2026Updated 3 months ago
EnVision-Research / PhysToolBench
View on GitHub
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs
☆30Jul 20, 2026Updated last week
AntResearchNLP / ViLaSR
View on GitHub
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆98Jul 27, 2025Updated last year
QC-LY / UiG
View on GitHub
Code for "Understanding-in-Generation:Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation"
☆15Nov 11, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mengcaopku / SpatialDreamer
View on GitHub
SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery
☆15Feb 1, 2026Updated 5 months ago
zhengxuJosh / Awesome-Streaming-Video-Avatar
View on GitHub
Awesome-Streaming-Video-Avatar
☆21Updated this week
mll-lab-nu / ViewAgent
View on GitHub
☆20Updated this week
zhengxuJosh / Awesome-RAG-Vision
View on GitHub
Awesome-RAG-Vision: a curated list of advanced retrieval augmented generation (RAG) for Computer Vision
☆339Jan 25, 2026Updated 6 months ago
yukangcao / Awesome-4D-Spatial-Intelligence
View on GitHub
A curated list of awesome papers for reconstructing 4D spatial intelligence from video. (arXiv 2507.21045)
☆515Jun 5, 2026Updated last month
PeiwenSun2000 / SpaceVista
View on GitHub
The official repo for SpaceVista: All-Scale Visual Spatial Reasoning from mm to km.
☆43May 26, 2026Updated 2 months ago
zhengxuJosh / DPPASS
View on GitHub
☆12Jul 24, 2023Updated 3 years ago
mll-lab-nu / MindCube
View on GitHub
☆164Mar 23, 2026Updated 4 months ago
UMass-Embodied-AGI / MindJourney
View on GitHub
[NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"
☆151Nov 4, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
cambrian-mllm / cambrian-s
View on GitHub
Cambrian-S: Towards Spatial Supersensing in Video
☆563Apr 3, 2026Updated 3 months ago
7zk1014 / PanoEnv
View on GitHub
☆15Jun 21, 2026Updated last month
EnVision-Research / LatentMorph
View on GitHub
[ICML 2026] LatentMorph: Morphing Latent Reasoning into Image Generation
☆47May 5, 2026Updated 2 months ago
gca-spatial-reasoning / gca
View on GitHub
Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"
☆91Apr 7, 2026Updated 3 months ago
zhengxuJosh / 360SFUDA
View on GitHub
Code for Panoramic Semantic Segmentation
☆16Apr 26, 2024Updated 2 years ago
SalesforceAIResearch / strefer
View on GitHub
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
☆19Jun 2, 2026Updated last month
VITA-Group / VLM-3R
View on GitHub
[CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆431Jul 15, 2026Updated 2 weeks ago
OpenSenseNova / SenseNova-SI
View on GitHub
[CVPR 2026] Scaling Spatial Intelligence with Multimodal Foundation Models
☆293May 14, 2026Updated 2 months ago
WHB139426 / TAB-Agent
View on GitHub
Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding
☆26Apr 5, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
arijitray1993 / awesome-spatial-reasoning
View on GitHub
Collection of the latest spatial, 3D, and video/temporal reasoning papers
☆36Sep 29, 2025Updated 10 months ago
ZJU-REAL / SpatialLadder
View on GitHub
[ICLR 2026] SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models
☆99Jun 9, 2026Updated last month
EnVision-Research / A4-Agent
View on GitHub
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning (ECCV 2026)
☆41Jun 29, 2026Updated last month
LaVi-Lab / Rethink_CoT_Video
View on GitHub
Official code for "Rethinking Chain-of-Thought Reasoning for Videos"
☆21Dec 14, 2025Updated 7 months ago
FoundationAgents / ReCode
View on GitHub
Next paradigm for LLM Agent. Unify plan and action through recursive code generation for adaptive, human-like decision-making.
☆559Apr 21, 2026Updated 3 months ago
zhangzaibin / spagent
View on GitHub
SPAgent, a foundation agent for understanding, reasoning over, and operating within the physical and spatial world.
☆210Updated this week
mll-lab-nu / Theory-of-Space
View on GitHub
THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…
☆85Feb 27, 2026Updated 5 months ago