AnjieCheng/SpatialRGPT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AnjieCheng/SpatialRGPT)

AnjieCheng / SpatialRGPT

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"

☆335

Alternatives and similar repositories for SpatialRGPT

Users that are interested in SpatialRGPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

remyxai / VQASynth
View on GitHub
Compose multimodal datasets 🎹
☆580Updated this week
BAAI-DCAI / SpatialBot
View on GitHub
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
☆349Sep 14, 2025Updated 10 months ago
vision-x-nyu / thinking-in-space
View on GitHub
Official repo and evaluation implementation of VSI-Bench
☆732Aug 5, 2025Updated 11 months ago
jiayuww / SpatialEval
View on GitHub
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
☆61Jan 23, 2025Updated last year
sled-group / COMFORT
View on GitHub
[ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…
☆22Oct 24, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
FatemehShiri / Spatial-MM
View on GitHub
☆12Jan 10, 2025Updated last year
qizekun / OmniSpatial
View on GitHub
[ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
☆88Jan 21, 2026Updated 6 months ago
wufeim / SpatialReasonerDataGen
View on GitHub
Synthetic VQA data generation code for SpatialReasoner.
☆20Nov 25, 2025Updated 7 months ago
THU-SI / Spatial-MLLM
View on GitHub
[NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆479Feb 5, 2026Updated 5 months ago
VITA-Group / VLM-3R
View on GitHub
[CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆428Updated this week
wentaoyuan / RoboPoint
View on GitHub
A Vision-Language Model for Spatial Affordance Prediction in Robotics
☆227Jul 17, 2025Updated last year
3dlg-hcvc / vigil3d
View on GitHub
ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Grounding
☆19Aug 8, 2025Updated 11 months ago
Zhoues / RoboRefer
View on GitHub
[NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"
☆263Dec 16, 2025Updated 7 months ago
OuyangKun10 / SpaceR
View on GitHub
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆111Jul 9, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
johnson111788 / SpatialReasoner
View on GitHub
Training recipe for SpatialReasoner [NeurIPS 2025]
☆45Apr 5, 2026Updated 3 months ago
Yangr116 / VST
View on GitHub
[ECCV2026] Visual Spatial Tuning
☆198Mar 25, 2026Updated 3 months ago
PeiwenSun2000 / SpaceVista
View on GitHub
The official repo for SpaceVista: All-Scale Visual Spatial Reasoning from mm to km.
☆43May 26, 2026Updated last month
ZCMax / LLaVA-3D
View on GitHub
[ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World
☆384Oct 21, 2025Updated 9 months ago
SpatialVision / Orient-Anything
View on GitHub
Orient Anything, ICML 2025
☆389Feb 6, 2026Updated 5 months ago
W-Ted / N3D-VLM
View on GitHub
Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
☆116Jan 14, 2026Updated 6 months ago
KAIST-Visual-AI-Group / APC-VLM
View on GitHub
[ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
☆66Sep 12, 2025Updated 10 months ago
Haochen-Wang409 / ross3d
View on GitHub
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
☆70Jul 22, 2025Updated 11 months ago
shiqichen17 / AdaptVis
View on GitHub
Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)
☆76May 2, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
AnjieCheng / SR-3D
View on GitHub
[ICLR'26] This repository is the implementation of "3D Aware Region Prompted Vision Language Model"
☆28Feb 19, 2026Updated 5 months ago
scene-verse / SceneVerse
View on GitHub
Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"
☆287Mar 19, 2025Updated last year
mengcaopku / SpatialDreamer
View on GitHub
SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery
☆15Feb 1, 2026Updated 5 months ago
LaVi-Lab / VG-LLM
View on GitHub
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
☆245Nov 28, 2025Updated 7 months ago
ActiveVisionLab / Awesome-LLM-3D
View on GitHub
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
☆2,238Apr 16, 2026Updated 3 months ago
AIGeeksGroup / 3D-R1
View on GitHub
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding
☆414Updated this week
LogosRoboticsGroup / SPAR
View on GitHub
From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptio…
☆90Jan 5, 2026Updated 6 months ago
SJTU-DENG-Lab / R1-Zero-VSI
View on GitHub
☆42Jun 9, 2025Updated last year
wangsen99 / LMEE
View on GitHub
(CVPR 26) Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration
☆35Mar 8, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wookiekim / CorrespondentDream
View on GitHub
Official PyTorch implementation of CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences (CVPR 2024 Po…
☆19Apr 29, 2024Updated 2 years ago
UMass-Embodied-AGI / 3D-Mem
View on GitHub
[CVPR 2025] Source codes for the paper "3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning"
☆266Oct 2, 2025Updated 9 months ago
neu-vi / struct2d
View on GitHub
Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)
☆31Oct 28, 2025Updated 8 months ago
SpatialVLA / SpatialVLA
View on GitHub
🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.
☆706Jun 23, 2025Updated last year
mll-lab-nu / Theory-of-Space
View on GitHub
THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…
☆85Feb 27, 2026Updated 4 months ago
mbanani / probe3d
View on GitHub
[CVPR 2024] Probing the 3D Awareness of Visual Foundation Models
☆354Dec 1, 2025Updated 7 months ago
mind-palace-laeqa / benchmark
View on GitHub
☆17Oct 31, 2025Updated 8 months ago