EmbodiedCity / UrbanVideo-Bench.codeLinks
[ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces"
☆26Updated 6 months ago
Alternatives and similar repositories for UrbanVideo-Bench.code
Users that are interested in UrbanVideo-Bench.code are comparing it to the libraries listed below
Sorting:
- ☆87Updated 8 months ago
- Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation, AVDN Challenge, ICCV CLVL 2023.☆21Updated 2 years ago
- [ECCV 2024] Official implementation of C-Instructor: Controllable Navigation Instruction Generation with Chain of Thought Prompting☆29Updated last year
- [AAAI-25 Oral] Official Implementation of "FLAME: Learning to Navigate with Multimodal LLM in Urban Environments"☆69Updated 3 months ago
- The official repository of [CVPR2025] DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering☆25Updated 9 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆40Updated last year
- Official implementation of NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments (ICCV'25).☆66Updated last month
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆80Updated last year
- ☆54Updated last year
- Benchmark and model for step-by-step reasoning in autonomous driving.☆68Updated 10 months ago
- Official implementation of: Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel☆34Updated 7 months ago
- Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"☆61Updated last year
- [CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding☆62Updated last year
- ☆22Updated 8 months ago
- officical code for ECCV 2024 paper "Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection"☆14Updated last year
- ☆41Updated 7 months ago
- ☆15Updated last year
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆79Updated 2 weeks ago
- [NeurIPS 2024] MSR3D: Advanced Situated Reasoning in 3D Scenes☆70Updated 2 months ago
- [NeurIPS 2025] SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models☆78Updated 4 months ago
- ☆30Updated 2 months ago
- [NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding☆145Updated last month
- ☆71Updated last year
- Fast-Slow Test-time Adaptation for Online Vision-and-Language Navigation☆30Updated 2 months ago
- STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?☆35Updated 3 weeks ago
- This is the official repo of OpenSatMap in NeurIPS 2024 D&B Track☆29Updated 7 months ago
- Repository for Vision-and-Language Navigation via Causal Learning (Accepted by CVPR 2024)☆98Updated 8 months ago
- Official implementation of paper "GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model", ICML 2025☆15Updated last month
- Code&Data for Grounded 3D-LLM with Referent Tokens☆131Updated last year
- [NeurIPS 2025] Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆226Updated last month