[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
β326Dec 14, 2024Updated last year
Alternatives and similar repositories for SpatialRGPT
Users that are interested in SpatialRGPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Compose multimodal datasets πΉβ572Jun 1, 2026Updated last week
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β345Sep 14, 2025Updated 8 months ago
- Official repo and evaluation implementation of VSI-Benchβ724Aug 5, 2025Updated 10 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMsβ60Jan 23, 2025Updated last year
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Unβ¦β21Oct 24, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- β12Jan 10, 2025Updated last year
- A Vision-Language Model for Spatial Affordance Prediction in Roboticsβ223Jul 17, 2025Updated 10 months ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulationβ62Sep 12, 2025Updated 8 months ago
- [NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ468Feb 5, 2026Updated 4 months ago
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ379Oct 21, 2025Updated 7 months ago
- Orient Anything, ICML 2025β385Feb 6, 2026Updated 4 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awarenessβ70Jul 22, 2025Updated 10 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)β30Oct 28, 2025Updated 7 months ago
- Training recipe for SpatialReasoner [NeurIPS 2025]β45Apr 5, 2026Updated 2 months ago
- End-to-end encrypted cloud storage - Proton Drive β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)β75May 2, 2025Updated last year
- ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Groundingβ19Aug 8, 2025Updated 10 months ago
- Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"β283Mar 19, 2025Updated last year
- Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resourcesβ2,219Apr 16, 2026Updated last month
- β42Jun 9, 2025Updated last year
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoningβ112Jul 9, 2025Updated 11 months ago
- Official PyTorch implementation of CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences (CVPR 2024 Poβ¦β19Apr 29, 2024Updated 2 years ago
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Modelsβ108Jan 14, 2026Updated 4 months ago
- [CVPR 2024] Probing the 3D Awareness of Visual Foundation Modelsβ350Dec 1, 2025Updated 6 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [NeurIPS 2024 & TPAMI 2026] Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiersβ212Apr 12, 2026Updated last month
- A paper list for spatial reasoningβ752Jan 19, 2026Updated 4 months ago
- [TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.β147Mar 25, 2023Updated 3 years ago
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptioβ¦β87Jan 5, 2026Updated 5 months ago
- [CVPR 2024] Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationshipsβ159Sep 16, 2024Updated last year
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"β47Feb 19, 2026Updated 3 months ago
- [CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstructionβ409Apr 23, 2026Updated last month
- [NeurIPS 2024] MSR3D: Advanced Situated Reasoning in 3D Scenesβ73Dec 2, 2025Updated 6 months ago
- Tame a Wild Camera: In-the-Wild Monocular Camera Calibrationβ89Dec 28, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modelingβ4,582Sep 26, 2025Updated 8 months ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Groundingβ133May 22, 2025Updated last year
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.β216Jun 4, 2025Updated last year
- β12Apr 18, 2025Updated last year
- β13Mar 28, 2025Updated last year
- [3DV 2026] Open Vocabulary Monocular 3D Object Detectionβ93Apr 29, 2026Updated last month
- Code for 3D-LLM: Injecting the 3D World into Large Language Modelsβ1,201Jun 6, 2024Updated 2 years ago