jiayuww / SpatialEvalLinks
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
☆59Updated last year
Alternatives and similar repositories for SpatialEval
Users that are interested in SpatialEval are comparing it to the libraries listed below
Sorting:
- ☆117Updated 6 months ago
- ☆46Updated 5 months ago
- STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?☆35Updated 3 weeks ago
- ☆41Updated 8 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆240Updated 6 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆70Updated last year
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆90Updated 6 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆83Updated 3 weeks ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆68Updated 9 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆96Updated last year
- GRPO Algorithm for Llava Architecture (Based on Verl)☆47Updated 9 months ago
- ☆124Updated 3 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆103Updated 7 months ago
- ☆68Updated 3 months ago
- Official Implementation of "Geometrically-Constrained Agent for Spatial Reasoning"☆49Updated last month
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆34Updated last year
- [CVPR'24 Highlight] The official code and data for paper "EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Lan…☆63Updated 10 months ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆104Updated last month
- Official codebase for the paper Latent Visual Reasoning☆109Updated 3 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆66Updated 8 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆135Updated 10 months ago
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"☆309Updated last year
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology☆73Updated 2 weeks ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆423Updated last year
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆203Updated 9 months ago
- [EMNLP 2025 Demo] Extracting internal representations from vision-language models. Beta version.☆108Updated this week
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆40Updated last year
- Visualizing the attention of vision-language models☆279Updated 11 months ago
- Visual Planning: Let's Think Only with Images☆295Updated 8 months ago