[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
β317Dec 14, 2024Updated last year
Alternatives and similar repositories for SpatialRGPT
Users that are interested in SpatialRGPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Compose multimodal datasets πΉβ554Jan 5, 2026Updated 3 months ago
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β340Sep 14, 2025Updated 6 months ago
- Official repo and evaluation implementation of VSI-Benchβ694Aug 5, 2025Updated 8 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMsβ60Jan 23, 2025Updated last year
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Unβ¦β21Oct 24, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- β12Jan 10, 2025Updated last year
- A Vision-Language Model for Spatial Affordance Prediction in Roboticsβ218Jul 17, 2025Updated 8 months ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulationβ58Sep 12, 2025Updated 6 months ago
- [NeurIPS 2025] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ457Feb 5, 2026Updated 2 months ago
- Orient Anything, ICML 2025β378Feb 6, 2026Updated 2 months ago
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ375Oct 21, 2025Updated 5 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awarenessβ68Jul 22, 2025Updated 8 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)β30Oct 28, 2025Updated 5 months ago
- Training recipe for SpatialReasoner [NeurIPS 2025]β41Updated this week
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Groundingβ18Aug 8, 2025Updated 8 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)β73May 2, 2025Updated 11 months ago
- Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"β277Mar 19, 2025Updated last year
- Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resourcesβ2,152Apr 1, 2026Updated last week
- β41Jun 9, 2025Updated 10 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoningβ110Jul 9, 2025Updated 9 months ago
- Official PyTorch implementation of CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences (CVPR 2024 Poβ¦β19Apr 29, 2024Updated last year
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Modelsβ99Jan 14, 2026Updated 2 months ago
- [CVPR 2024] Probing the 3D Awareness of Visual Foundation Modelsβ347Dec 1, 2025Updated 4 months ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A paper list for spatial reasoningβ706Jan 19, 2026Updated 2 months ago
- Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)β209Oct 20, 2025Updated 5 months ago
- [TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.β143Mar 25, 2023Updated 3 years ago
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptioβ¦β84Jan 5, 2026Updated 3 months ago
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"β47Feb 19, 2026Updated last month
- [CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstructionβ369Mar 9, 2026Updated last month
- [NeurIPS 2024] MSR3D: Advanced Situated Reasoning in 3D Scenesβ71Dec 2, 2025Updated 4 months ago
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modelingβ4,494Sep 26, 2025Updated 6 months ago
- Tame a Wild Camera: In-the-Wild Monocular Camera Calibrationβ88Dec 28, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.β207Jun 4, 2025Updated 10 months ago
- [CVPR 2024] Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationshipsβ150Sep 16, 2024Updated last year
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Groundingβ130May 22, 2025Updated 10 months ago
- β12Apr 18, 2025Updated 11 months ago
- β13Mar 28, 2025Updated last year
- [3DV 2026] Open Vocabulary Monocular 3D Object Detectionβ86Nov 25, 2025Updated 4 months ago
- [CVPR 2025] Program synthesis for 3D spatial reasoningβ58Jun 16, 2025Updated 9 months ago