ZJU-REAL / ViewSpatial-BenchLinks
ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models
☆48Updated 3 weeks ago
Alternatives and similar repositories for ViewSpatial-Bench
Users that are interested in ViewSpatial-Bench are comparing it to the libraries listed below
Sorting:
- Mind the Gap: Bridging Thought Leap for Improved CoT Tuning https://arxiv.org/abs/2505.14684☆37Updated last month
- A paper list for spatial reasoning☆94Updated 2 weeks ago
- Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"☆68Updated 3 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆133Updated last month
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆64Updated 3 months ago
- ☆86Updated 3 months ago
- Code for Let LLMs Break Free from Overthinking via Self-Braking Tuning. https://arxiv.org/abs/2505.14604☆41Updated 2 weeks ago
- ☆34Updated 2 weeks ago
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 5 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 2 months ago
- ☆46Updated last week
- ☆16Updated 5 months ago
- ☆24Updated 4 months ago
- Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape☆16Updated 3 weeks ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆51Updated last week
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆30Updated last week
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆31Updated 2 months ago
- A python script for downloading huggingface datasets and models.☆19Updated 2 months ago
- [Arxiv Paper 2504.09130]: VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search☆18Updated last month
- Official implement of MIA-DPO☆58Updated 5 months ago
- Code for Paper InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models☆28Updated 3 weeks ago
- [ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…☆136Updated last week
- ☆53Updated 2 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆63Updated 2 weeks ago
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆54Updated last month
- ☆101Updated this week
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆74Updated last month
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆109Updated 3 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆118Updated 3 weeks ago
- OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.☆231Updated 3 weeks ago