[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
β317Dec 14, 2024Updated last year
Alternatives and similar repositories for SpatialRGPT
Users that are interested in SpatialRGPT are comparing it to the libraries listed below
Sorting:
- Compose multimodal datasets πΉβ550Jan 5, 2026Updated 2 months ago
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β338Sep 14, 2025Updated 6 months ago
- Official repo and evaluation implementation of VSI-Benchβ682Aug 5, 2025Updated 7 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMsβ59Jan 23, 2025Updated last year
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Unβ¦β21Oct 24, 2024Updated last year
- β12Jan 10, 2025Updated last year
- A Vision-Language Model for Spatial Affordance Prediction in Roboticsβ214Jul 17, 2025Updated 8 months ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulationβ57Sep 12, 2025Updated 6 months ago
- [NeurIPS 2025] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ451Feb 5, 2026Updated last month
- Orient Anything, ICML 2025β376Feb 6, 2026Updated last month
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ374Oct 21, 2025Updated 5 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awarenessβ67Jul 22, 2025Updated 8 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)β30Oct 28, 2025Updated 4 months ago
- Training recipe for SpatialReasoner [NeurIPS 2025]β41Updated this week
- ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Groundingβ17Aug 8, 2025Updated 7 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)β71May 2, 2025Updated 10 months ago
- Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"β278Mar 19, 2025Updated last year
- Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resourcesβ2,127Feb 3, 2026Updated last month
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Modelsβ93Jan 14, 2026Updated 2 months ago
- β41Jun 9, 2025Updated 9 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoningβ107Jul 9, 2025Updated 8 months ago
- Official PyTorch implementation of CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences (CVPR 2024 Poβ¦β19Apr 29, 2024Updated last year
- A paper list for spatial reasoningβ683Jan 19, 2026Updated 2 months ago
- [CVPR 2024] Probing the 3D Awareness of Visual Foundation Modelsβ348Dec 1, 2025Updated 3 months ago
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptioβ¦β81Jan 5, 2026Updated 2 months ago
- Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)β206Oct 20, 2025Updated 5 months ago
- [TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.β140Mar 25, 2023Updated 2 years ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"β47Feb 19, 2026Updated last month
- [CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstructionβ362Mar 9, 2026Updated last week
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modelingβ4,283Sep 26, 2025Updated 5 months ago
- [NeurIPS 2024] MSR3D: Advanced Situated Reasoning in 3D Scenesβ70Dec 2, 2025Updated 3 months ago
- Tame a Wild Camera: In-the-Wild Monocular Camera Calibrationβ88Dec 28, 2023Updated 2 years ago
- [CVPR 2024] Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationshipsβ146Sep 16, 2024Updated last year
- Spatial Aptitude Training for Multimodal Langauge Modelsβ25Feb 8, 2026Updated last month
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.β206Jun 4, 2025Updated 9 months ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Groundingβ129May 22, 2025Updated 10 months ago
- β12Apr 18, 2025Updated 11 months ago
- β13Mar 28, 2025Updated 11 months ago
- [ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Cloudsβ984Updated this week