[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
β329Dec 14, 2024Updated last year
Alternatives and similar repositories for SpatialRGPT
Users that are interested in SpatialRGPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Compose multimodal datasets πΉβ577Jun 18, 2026Updated last week
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β347Sep 14, 2025Updated 9 months ago
- Official repo and evaluation implementation of VSI-Benchβ728Aug 5, 2025Updated 10 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMsβ60Jan 23, 2025Updated last year
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Unβ¦β21Oct 24, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- β12Jan 10, 2025Updated last year
- A Vision-Language Model for Spatial Affordance Prediction in Roboticsβ226Jul 17, 2025Updated 11 months ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulationβ62Sep 12, 2025Updated 9 months ago
- [NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligenceβ470Feb 5, 2026Updated 4 months ago
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ380Oct 21, 2025Updated 8 months ago
- Orient Anything, ICML 2025β384Feb 6, 2026Updated 4 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awarenessβ70Jul 22, 2025Updated 11 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)β31Oct 28, 2025Updated 8 months ago
- Training recipe for SpatialReasoner [NeurIPS 2025]β45Apr 5, 2026Updated 2 months ago
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)β76May 2, 2025Updated last year
- ViGiL3D: A Linguistically Diverse Dataset for 3D Visual Groundingβ19Aug 8, 2025Updated 10 months ago
- Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"β286Mar 19, 2025Updated last year
- Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resourcesβ2,225Apr 16, 2026Updated 2 months ago
- β42Jun 9, 2025Updated last year
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoningβ112Jul 9, 2025Updated 11 months ago
- Official PyTorch implementation of CorrespondentDream: Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences (CVPR 2024 Poβ¦β19Apr 29, 2024Updated 2 years ago
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Modelsβ110Jan 14, 2026Updated 5 months ago
- [CVPR 2024] Probing the 3D Awareness of Visual Foundation Modelsβ350Dec 1, 2025Updated 7 months ago
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [NeurIPS 2024 & TPAMI 2026] Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiersβ213Apr 12, 2026Updated 2 months ago
- A paper list for spatial reasoningβ760Jan 19, 2026Updated 5 months ago
- [TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.β147Mar 25, 2023Updated 3 years ago
- From Flatland to Space (SPAR). Accepted to NeurIPS 2025 Datasets & Benchmarks. A large-scale dataset & benchmark for 3D spatial perceptioβ¦β87Jan 5, 2026Updated 5 months ago
- [CVPR 2024] Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationshipsβ165Sep 16, 2024Updated last year
- Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"β47Feb 19, 2026Updated 4 months ago
- [CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstructionβ413Apr 23, 2026Updated 2 months ago
- [NeurIPS 2024] MSR3D: Multimodal Situated Reasoning in 3D Scenesβ74Dec 2, 2025Updated 6 months ago
- Tame a Wild Camera: In-the-Wild Monocular Camera Calibrationβ89Dec 28, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [NeurIPS 2025] SpatialLM: Training Large Language Models for Structured Indoor Modelingβ4,597Sep 26, 2025Updated 9 months ago
- [CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Groundingβ134May 22, 2025Updated last year
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.β215Jun 4, 2025Updated last year
- β12Apr 18, 2025Updated last year
- β13Mar 28, 2025Updated last year
- [3DV 2026] Open Vocabulary Monocular 3D Object Detectionβ95Apr 29, 2026Updated 2 months ago
- Code for 3D-LLM: Injecting the 3D World into Large Language Modelsβ1,205Jun 6, 2024Updated 2 years ago