SparrowZheyuan18 / Awesome-GeolocalizationLinks
A Paper List for Geo-localization Research
☆14Updated 10 months ago
Alternatives and similar repositories for Awesome-Geolocalization
Users that are interested in Awesome-Geolocalization are comparing it to the libraries listed below
Sorting:
- [ICML 2024] GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Mode☆54Updated 7 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆45Updated 5 months ago
- [arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps☆62Updated 2 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆22Updated 5 months ago
- [ICCV 2025] Token Activation Map to Visually Explain Multimodal LLMs☆41Updated this week
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆157Updated 2 months ago
- Official PyTorch Code of ReKV (ICLR'25)☆35Updated 4 months ago
- ☆89Updated 3 months ago
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆47Updated 3 weeks ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆50Updated 3 weeks ago
- [ICCV 2025] The official implementation of the paper “Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm”☆60Updated this week
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆34Updated 3 weeks ago
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆138Updated last month
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆69Updated last week
- ☆12Updated 7 months ago
- A paper list for spatial reasoning☆121Updated last month
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆29Updated 2 months ago
- Official implementation of "RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics"☆96Updated last week
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆230Updated 2 months ago
- ☆22Updated last month
- Official repository for “FlowCut: Rethinking Redundancy via Information Flow for Efficient Vision-Language Models”☆19Updated 2 weeks ago
- ☆37Updated last month
- ☆69Updated 2 weeks ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆184Updated this week
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"☆217Updated 7 months ago
- 🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resamplin…☆39Updated last month
- Pixel-Level Reasoning Model trained with RL☆167Updated 2 weeks ago
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆39Updated last month
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆53Updated last week
- Visual Planning: Let's Think Only with Images☆258Updated last month