EmbodiedCity / UrbanVideo-Bench.codeLinks
[ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces"
☆24Updated 6 months ago
Alternatives and similar repositories for UrbanVideo-Bench.code
Users that are interested in UrbanVideo-Bench.code are comparing it to the libraries listed below
Sorting:
- ☆87Updated 8 months ago
- Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation, AVDN Challenge, ICCV CLVL 2023.☆21Updated 2 years ago
- [AAAI-25 Oral] Official Implementation of "FLAME: Learning to Navigate with Multimodal LLM in Urban Environments"☆69Updated 3 months ago
- [ECCV 2024] Official implementation of C-Instructor: Controllable Navigation Instruction Generation with Chain of Thought Prompting☆29Updated last year
- The official repository of [CVPR2025] DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering☆25Updated 9 months ago
- Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"☆61Updated last year
- Official implementation of: Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel☆34Updated 7 months ago
- Official implementation of NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments (ICCV'25).☆66Updated last month
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆80Updated last year
- [CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding☆62Updated last year
- ☆30Updated 2 months ago
- STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?☆35Updated 3 weeks ago
- ☆54Updated last year
- ☆41Updated 7 months ago
- The official repo of our work "Pensieve: Retrospect-then-Compare mitigates Visual Hallucination"☆16Updated last year
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆40Updated last year
- ☆25Updated 11 months ago
- Official implementation of paper "GAPrompt: Geometry-Aware Point Cloud Prompt for 3D Vision Model", ICML 2025☆15Updated last month
- [AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…☆44Updated 2 years ago
- ☆15Updated last year
- [NeurIPS 2025] SURDS: Benchmarking Spatial Understanding and Reasoning in Driving Scenarios with Vision Language Models☆78Updated 4 months ago
- [ICLR 2026] OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆79Updated 2 weeks ago
- [ECCV24] Navigation Instruction Generation with BEV Perception and Large Language Models☆30Updated last year
- [NeurIPS 2024] MSR3D: Advanced Situated Reasoning in 3D Scenes☆70Updated 2 months ago
- [NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding☆145Updated last month
- ☆14Updated last year
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆26Updated 2 years ago
- officical code for ECCV 2024 paper "Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection"☆14Updated last year
- This is the official repo of OpenSatMap in NeurIPS 2024 D&B Track☆29Updated 7 months ago
- Code&Data for Grounded 3D-LLM with Referent Tokens☆131Updated last year