taco-group / MapBenchLinks
☆28Updated 3 months ago
Alternatives and similar repositories for MapBench
Users that are interested in MapBench are comparing it to the libraries listed below
Sorting:
- ☆37Updated last month
- [arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps☆62Updated 2 months ago
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆14Updated last month
- Official implementation of CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding.☆25Updated 3 weeks ago
- ☆19Updated 2 months ago
- ☆33Updated last year
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆45Updated 5 months ago
- Scaffold Prompting to promote LMMs☆43Updated 7 months ago
- ☆45Updated 2 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆64Updated last month
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆20Updated 5 months ago
- Visual Planning: Let's Think Only with Images☆258Updated last month
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆39Updated last month
- ☆19Updated last month
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆46Updated 6 months ago
- ☆12Updated 7 months ago
- LEO: A powerful Hybrid Multimodal LLM☆18Updated 5 months ago
- ☆11Updated 3 months ago
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better☆31Updated last month
- Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆32Updated 2 months ago
- 4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)☆110Updated 4 months ago
- STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?☆22Updated last week
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆17Updated 3 weeks ago
- Official Implementation of DINO-Foresight: Looking into the Future with DINO☆54Updated 4 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆93Updated last week
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆29Updated last week
- AutoTrust, a groundbreaking benchmark designed to assess the trustworthiness of DriveVLMs. This work aims to enhance public safety by ens…☆46Updated 6 months ago
- ☆45Updated last month
- 3DGraphLLM is a model that uses a 3D scene graph and an LLM to perform 3D vision-language tasks.☆66Updated 2 months ago
- Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"☆17Updated 3 months ago