tsinghua-fib-lab / UrbanLLaVALinks
[ICCV 2025] UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoing and Understanding
☆26Updated 3 weeks ago
Alternatives and similar repositories for UrbanLLaVA
Users that are interested in UrbanLLaVA are comparing it to the libraries listed below
Sorting:
- The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"☆14Updated 10 months ago
- ☆37Updated last month
- [ICCV 2025] The official implementation of the paper “Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm”☆60Updated this week
- ☆39Updated last year
- [ICLR 2025] Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception☆14Updated last week
- [ICML 2024] GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Mode☆54Updated 7 months ago
- [CVPR 2025 Highlight🔥] Official code repository for "Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuni…☆93Updated 2 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better☆31Updated last month
- [CVPR 2025] UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting☆36Updated last week
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆27Updated last month
- ☆45Updated 2 months ago
- GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks☆56Updated 2 weeks ago
- [ECCV 2024 Oral] The official implementation of paper: COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation☆10Updated 11 months ago
- Implementation of the paper: VG4D: Vision-Language Model Goes 4D Video Recognition(ICRA 2024)☆15Updated last year
- ☆101Updated 7 months ago
- [IEEE RA-L 2025] Generate Weather with LLM. Code for "WeatherDG: LLM-assisted Procedural Weather Generation for Domain-Generalized Semant…☆38Updated last month
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆53Updated last week
- This code is used to get images from google maps given a GPS region or a center GPS point and a Zoom level.☆17Updated 7 months ago
- The official implementation of Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion☆42Updated last week
- [CVPR'25] Official implementation of "Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation"☆29Updated 2 weeks ago
- OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models☆48Updated 2 weeks ago
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆44Updated 3 weeks ago
- The official implementation of "PixelThink: Towards Efficient Chain-of-Pixel Reasoning" (arXiv 2025)☆35Updated last month
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆47Updated 3 weeks ago
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆43Updated last month
- [arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps☆62Updated 2 months ago
- [AAAI 2025] GFlow: Recovering 4D World from Monocular Video☆46Updated 2 months ago
- This is the official repo of OpenSatMap in NeurIPS 2024 D&B Track☆23Updated last week
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆62Updated last month
- ☆14Updated 2 months ago