[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
β320Dec 14, 2024Updated last year
Alternatives and similar repositories for SpatialRGPT
Users that are interested in SpatialRGPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Compose multimodal datasets πΉβ561Updated this week
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β345Sep 14, 2025Updated 7 months ago
- Official repo and evaluation implementation of VSI-Benchβ706Aug 5, 2025Updated 8 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMsβ60Jan 23, 2025Updated last year
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Unβ¦β21Oct 24, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- β12Jan 10, 2025Updated last year
- A Vision-Language Model for Spatial Affordance Prediction in Roboticsβ220Jul 17, 2025Updated 9 months ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulationβ60Sep 12, 2025Updated 7 months ago
- [NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ462Feb 5, 2026Updated 2 months ago
- Orient Anything, ICML 2025β378Feb 6, 2026Updated 2 months ago
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ376Oct 21, 2025Updated 6 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awarenessβ69Jul 22, 2025Updated 9 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)β30Oct 28, 2025Updated 6 months ago
- Training recipe for SpatialReasoner [NeurIPS 2025]β44Apr 5, 2026Updated 3 weeks ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Groundingβ18Aug 8, 2025Updated 8 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)β73May 2, 2025Updated 11 months ago
- Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"β281Mar 19, 2025Updated last year
- Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resourcesβ2,187Apr 16, 2026Updated 2 weeks ago
- β42Jun 9, 2025Updated 10 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoningβ110Jul 9, 2025Updated 9 months ago
- Official PyTorch implementation of CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences (CVPR 2024 Poβ¦β19Apr 29, 2024Updated 2 years ago
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Modelsβ100Jan 14, 2026Updated 3 months ago
- [CVPR 2024] Probing the 3D Awareness of Visual Foundation Modelsβ348Dec 1, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [NeurIPS 2024 & TPAMI 2026] Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiersβ211Apr 12, 2026Updated 2 weeks ago
- A paper list for spatial reasoningβ733Jan 19, 2026Updated 3 months ago
- [TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.β143Mar 25, 2023Updated 3 years ago
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptioβ¦β84Jan 5, 2026Updated 3 months ago
- [CVPR 2024] Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationshipsβ152Sep 16, 2024Updated last year
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"β47Feb 19, 2026Updated 2 months ago
- [CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstructionβ379Apr 23, 2026Updated last week
- [NeurIPS 2024] MSR3D: Advanced Situated Reasoning in 3D Scenesβ72Dec 2, 2025Updated 4 months ago
- Tame a Wild Camera: In-the-Wild Monocular Camera Calibrationβ88Dec 28, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modelingβ4,528Sep 26, 2025Updated 7 months ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Groundingβ131May 22, 2025Updated 11 months ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.β211Jun 4, 2025Updated 10 months ago
- β12Apr 18, 2025Updated last year
- β13Mar 28, 2025Updated last year
- [3DV 2026] Open Vocabulary Monocular 3D Object Detectionβ86Updated this week
- [CVPR 2025] Program synthesis for 3D spatial reasoningβ59Jun 16, 2025Updated 10 months ago