[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding
β129May 22, 2025Updated 9 months ago
Alternatives and similar repositories for VLM-Grounder
Users that are interested in VLM-Grounder are comparing it to the libraries listed below
Sorting:
- Code&Data for Grounded 3D-LLM with Referent Tokensβ132Jan 5, 2025Updated last year
- π up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.β261Jan 14, 2026Updated last month
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilitiesβ81Oct 10, 2024Updated last year
- [CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Groundingβ62Aug 3, 2024Updated last year
- [CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AIβ652Jun 13, 2025Updated 8 months ago
- BehAV: Behavioral Rule Guided Autonomy Using VLM for Robot Navigation in Outdoor Scenes (ICRA'25)β38Oct 3, 2024Updated last year
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Unβ¦β21Oct 24, 2024Updated last year
- [CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Groundingβ208Apr 21, 2025Updated 10 months ago
- [ICLR 2026] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligenceβ78Updated this week
- Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)β206Oct 20, 2025Updated 4 months ago
- Code for "Robot See Robot Do" presented at CoRL 2024!β157Nov 26, 2024Updated last year
- β10Oct 18, 2024Updated last year
- [NeurIPS 2024] SCube: Instant Large-Scale Scene Reconstruction using VoxSplatsβ517Oct 14, 2025Updated 4 months ago
- Official implementation of ECCV24 paper "SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding"β278Mar 19, 2025Updated 11 months ago
- [AAAI 2025] Official data and code for "TB-HSU: Hierarchical 3D Scene Understanding with Contextual Affordances"β15Sep 11, 2025Updated 5 months ago
- [ECCV 2024] 4D Contrastive Superflows are Dense 3D Representation Learnersβ51Dec 4, 2025Updated 2 months ago
- Open-source code for Paper: Real-Time Metric-Semantic Mapping for Autonomous Navigation in Outdoor Environmentsβ115Sep 26, 2024Updated last year
- [NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.β226Oct 17, 2025Updated 4 months ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spacesβ88Jun 6, 2025Updated 8 months ago
- GOTPR: General Outdoor Text-based Place Recognition Using Scene Graph Retrieval with OpenStreetMapβ32May 22, 2025Updated 9 months ago
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D Worldβ373Oct 21, 2025Updated 4 months ago
- Evaluation tool for the LILocBench benchmark challengeβ24Aug 8, 2025Updated 6 months ago
- Code release for Revisit Anything: Visual Place Recognition via Image Segment Retrieval (ECCV 2024)β165Aug 2, 2025Updated 6 months ago
- The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.β335Sep 14, 2025Updated 5 months ago
- [ICCV 2025] 3DGraphLLM is a model that uses a 3D scene graph and an LLM to perform 3D vision-language tasks.β104Dec 10, 2025Updated 2 months ago
- Official PyTorch implementation of the paper βCLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Undβ¦β55Apr 25, 2024Updated last year
- [CVPR 2023] We propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices,β¦β78May 30, 2024Updated last year
- Constraint Satisfaction Visual Groundingβ15Aug 10, 2025Updated 6 months ago
- CP-SLAM: Collaborative Neural Point-based SLAMβ57Sep 9, 2024Updated last year
- Code for ICRA 2024 paper "VOLoc: Visual Place Recognition by Querying Compressed Lidar Map"β56Feb 27, 2024Updated 2 years ago
- [ICRA2024] Official Repository for PeLiCal: Targetless Extrinsic Calibration via Penetrating Lines for RGB-D Cameras with Limited Co-visiβ¦β39Apr 23, 2024Updated last year
- [RSS 2024] Learning Manipulation by Predicting Interactionβ120Jul 2, 2025Updated 7 months ago
- Official implementation of "Re3Sim: Generating High-Fidelity Simulation Data via 3D-Photorealistic Real-to-Sim for Robotic Manipulation"β133Sep 18, 2025Updated 5 months ago
- Code & data for "RoboGround: Robotic Manipulation with Grounded Vision-Language Priors" (CVPR 2025)β38May 25, 2025Updated 9 months ago
- Some examples to show how to use Quatro implemented in TEASER++ libraryβ42Jan 5, 2024Updated 2 years ago
- [RSS 2024] iMESA - an incremental distributed algorithm for Collaborative Simultaneous Localization and Mappingβ111Aug 16, 2024Updated last year
- [NeurIPS'24] Large Spatial Model: End-to-end Unposed Images to Semantic 3Dβ227Feb 11, 2026Updated 2 weeks ago
- Code for IROS 2022 work "Visual-Inertial Multi-Instance Dynamic SLAM with Object-level Relocalisation"β41Mar 20, 2023Updated 2 years ago
- [ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Cloudsβ975Aug 14, 2025Updated 6 months ago