EmbodiedCity / UrbanVideo-Bench.codeLinks

[ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces"

☆26

Alternatives and similar repositories for UrbanVideo-Bench.code

Users that are interested in UrbanVideo-Bench.code are comparing it to the libraries listed below

Sorting:

EmbodiedCity / Embodied-R.code
☆87Updated 8 months ago
yifeisu / TG-GAT
Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation, AVDN Challenge, ICCV CLVL 2023.
☆21Updated 2 years ago
refkxh / C-Instructor
[ECCV 2024] Official implementation of C-Instructor: Controllable Navigation Instruction Generation with Chain of Thought Prompting
☆29Updated last year
xyz9911 / FLAME
[AAAI-25 Oral] Official Implementation of "FLAME: Learning to Navigate with Multimodal LLM in Urban Environments"
☆69Updated 3 months ago
LZ-CH / DSPNet
The official repository of [CVPR2025] DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
☆25Updated 9 months ago
2toinf / IVM
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
☆40Updated last year
Feliciaxyao / NavMorph
Official implementation of NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments (ICCV'25).
☆66Updated last month
ZCMax / ScanReason
[ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities
☆80Updated last year
sg-3d / sg3d
☆54Updated last year
ayesha-ishaq / DriveLMM-o1
Benchmark and model for step-by-step reasoning in autonomous driving.
☆68Updated 10 months ago
wz0919 / VLN-SRDF
Official implementation of: Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
☆34Updated 7 months ago
eric-ai-lab / Aerial-Vision-and-Dialog-Navigation
Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"
☆61Updated last year
CurryYuan / ZSVG3D
[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
☆62Updated last year
3DLLM-Mem / 3DLLM-Mem
☆22Updated 8 months ago
GradiusTwinbee / GLIS
officical code for ECCV 2024 paper "Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection"
☆14Updated last year
SJTU-DENG-Lab / R1-Zero-VSI
☆41Updated 7 months ago
zhangzaibin / AD-H
☆15Updated last year
qizekun / OmniSpatial
[ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
☆79Updated 2 weeks ago
MSR3D / MSR3D
[NeurIPS 2024] MSR3D: Advanced Situated Reasoning in 3D Scenes
☆70Updated 2 months ago
XiandaGuo / Drive-MLLM
[NeurIPS 2025] SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models
☆78Updated 4 months ago
VinceOuti / Open3DVQA
☆30Updated 2 months ago
Visual-AI / 3DRS
[NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
☆145Updated last month
xmed-lab / NuInstruct
☆71Updated last year
Feliciaxyao / ICML2024-FSTTA
Fast-Slow Test-time Adaptation for Online Vision-and-Language Navigation
☆30Updated 2 months ago
MINT-SJTU / STI-Bench
STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
☆35Updated 3 weeks ago
bjzhb666 / OpenSatMap-offical
This is the official repo of OpenSatMap in NeurIPS 2024 D&B Track
☆29Updated 7 months ago
CrystalSixone / VLN-GOAT
Repository for Vision-and-Language Navigation via Causal Learning (Accepted by CVPR 2024)
☆98Updated 8 months ago
zhoujiahuan1991 / ICML2025-GAPrompt
Official implementation of paper "GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model", ICML 2025
☆15Updated last month
InternRobotics / Grounded_3D-LLM
Code&Data for Grounded 3D-LLM with Referent Tokens
☆131Updated last year
Zhoues / RoboRefer
[NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"
☆226Updated last month