taco-group / MapBenchLinks
☆35Updated 3 months ago
Alternatives and similar repositories for MapBench
Users that are interested in MapBench are comparing it to the libraries listed below
Sorting:
- ☆41Updated 7 months ago
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆78Updated 2 months ago
- [arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps☆72Updated 3 weeks ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆71Updated 2 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆83Updated 2 weeks ago
- ☆63Updated last month
- Code release for "SegLLM: Multi-round Reasoning Segmentation"☆127Updated 11 months ago
- Visual Planning: Let's Think Only with Images☆294Updated 8 months ago
- Dream-VL and Dream-VLA, a diffusion VLM and a diffusion VLA.☆101Updated 3 weeks ago
- STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?☆35Updated 3 weeks ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆87Updated 8 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆40Updated last year
- ☆28Updated 9 months ago
- ☆97Updated 7 months ago
- ☆119Updated 3 weeks ago
- Scaffold Prompting to promote LMMs☆45Updated last year
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆36Updated 2 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆160Updated 3 weeks ago
- ☆58Updated 8 months ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆58Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆38Updated 3 months ago
- Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"☆53Updated 7 months ago
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆23Updated 2 months ago
- 🤖 [ICLR'25] Multimodal Video Understanding Framework (MVU)☆54Updated last year
- Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".☆146Updated 4 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆77Updated 2 weeks ago
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).☆200Updated 9 months ago
- [IJCV 2024]☆19Updated last year
- ☆93Updated last month