LiamLian0727 / Euclids_GiftLinks
This repo is the official implementation of "Euclid’s Gift: Enhancing Spatial Perception and Reasoning in Vision‑Language Models via Geometric Surrogate Tasks"
☆23Updated last month
Alternatives and similar repositories for Euclids_Gift
Users that are interested in Euclids_Gift are comparing it to the libraries listed below
Sorting:
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆84Updated 4 months ago
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆91Updated 3 weeks ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆225Updated this week
- A collection of awesome think with videos papers.☆73Updated 2 weeks ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆32Updated 6 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆207Updated 4 months ago
- ☆65Updated last month
- ☆57Updated 2 months ago
- ☆111Updated 4 months ago
- ☆152Updated 3 weeks ago
- [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs☆57Updated 10 months ago
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆79Updated 10 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆59Updated 5 months ago
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆142Updated this week
- [EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆44Updated this week
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆49Updated 3 months ago
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆66Updated 6 months ago
- Visual Planning: Let's Think Only with Images☆285Updated 7 months ago
- [arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps☆70Updated last month
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".☆73Updated 5 months ago
- ☆36Updated last month
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆93Updated 3 months ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆202Updated 5 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆106Updated 7 months ago
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"☆71Updated last month
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models☆43Updated 5 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆153Updated 9 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆108Updated last month
- TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models☆62Updated 3 weeks ago
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆54Updated 3 weeks ago