OpenRobotLab / VLM-GrounderLinks

[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding

☆108

Alternatives and similar repositories for VLM-Grounder

Users that are interested in VLM-Grounder are comparing it to the libraries listed below

Sorting:

iris0329 / SeeGround
[CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
☆152Updated 2 months ago
facebookresearch / univlg
Unifying 2D and 3D Vision-Language Understanding
☆95Updated 3 months ago
staymylove / 3DMIT
Code of 3DMIT: 3D MULTI-MODAL INSTRUCTION TUNING FOR SCENE UNDERSTANDING
☆30Updated 11 months ago
OpenRobotLab / Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
☆123Updated 6 months ago
CurryYuan / ZSVG3D
[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
☆54Updated 11 months ago
BIT-DYN / OpenObj
[RAL 2024] OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding
☆27Updated 5 months ago
sg-3d / sg3d
☆49Updated 9 months ago
YunzeMan / Lexicon3D
[NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
☆94Updated 5 months ago
HaoyiZhu / SPA
[ICLR 2025] SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
☆162Updated 3 weeks ago
UMass-Embodied-AGI / 3D-Mem
[CVPR 2025] Source codes for the paper "3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning"
☆149Updated last month
WeitaiKang / Intent3D
[ICLR 2025] Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
☆23Updated 4 months ago
MSR3D / MSR3D
[NeurIPS 2024] Official code repository for MSR3D paper
☆60Updated 3 weeks ago
pzhren / InfiniteWorld
☆63Updated last month
yangtiming / ImOV3D
ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images (NeurIPS2024)
☆82Updated 6 months ago
xuxw98 / Online3D
[CVPR 2024] Memory-based Adapters for Online 3D Scene Perception
☆120Updated 3 months ago
Jingkang50 / PSG4D
4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)
☆110Updated 4 months ago
honghd16 / GSA-VLN
Official repository of General Scene Adaptation for Vision-and-Language Navigation (ICLR'2025)
☆46Updated 3 months ago
ZCMax / ScanReason
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
☆75Updated 9 months ago
OpenRobotLab / StreamVLN
Official implementation of the paper: "StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling"
☆76Updated last week
LaVi-Lab / Video-3D-LLM
[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.
☆133Updated last month
PKU-HMI-Lab / LIFT3D
[CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
☆149Updated 3 weeks ago
SceneFun3D / scenefun3d
SceneFun3D ToolKit
☆147Updated 2 months ago
HaochenZ11 / VLA-3D
☆67Updated 6 months ago
VinAIResearch / Open3DIS
Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance (CVPR 2024)
☆101Updated 8 months ago
linukc / BeyondBareQueries
☆28Updated last month
zjwzcx / GLEAM
[ICCV 2025] GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene
☆61Updated this week
facebookresearch / nwm
Official code for the CVPR 2025 paper "Navigation World Models".
☆297Updated last week
CognitiveAISystems / 3DGraphLLM
3DGraphLLM is a model that uses a 3D scene graph and an LLM to perform 3D vision-language tasks.
☆66Updated 2 months ago
PQ3D / PQ3D
Official implementation of the paper "Unifying 3D Vision-Language Understanding via Promptable Queries"
☆77Updated 11 months ago
MrZihan / Sim2Real-VLN-3DFF
Official implementation of Sim-to-Real Transfer via 3D Feature Fields for Vision-and-Language Navigation (CoRL'24).
☆67Updated 4 months ago