ZJU-REAL / ViewSpatial-BenchLinks
ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models
☆52Updated last month
Alternatives and similar repositories for ViewSpatial-Bench
Users that are interested in ViewSpatial-Bench are comparing it to the libraries listed below
Sorting:
- A paper list for spatial reasoning☆121Updated last month
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆157Updated 2 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆69Updated last week
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆49Updated 3 weeks ago
- ☆57Updated 3 months ago
- ☆76Updated 3 weeks ago
- Visual Planning: Let's Think Only with Images☆258Updated last month
- ☆89Updated 3 months ago
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆68Updated 4 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆32Updated 2 weeks ago
- ☆63Updated this week
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆102Updated last week
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆45Updated 5 months ago
- [LLaVA-Video-R1]✨First Adaptation of R1 to LLaVA-Video (2025-03-18)☆29Updated 2 months ago
- ☆38Updated last month
- Pixel-Level Reasoning Model trained with RL☆167Updated 2 weeks ago
- ☆50Updated last month
- ☆17Updated 2 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆50Updated 3 weeks ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆60Updated this week
- Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape. This work is accepted by ICCV 2025.☆27Updated last week
- ⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.☆174Updated last month
- [CVPR2024] This is the official implement of MP5☆103Updated last year
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆83Updated last month
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆84Updated last week
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆66Updated 4 months ago
- ☆60Updated last month
- Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents☆145Updated 2 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆290Updated 3 weeks ago
- ☆25Updated 5 months ago