LiamLian0727 / Euclids_GiftLinks
This repo is the official implementation of "Euclid’s Gift: Enhancing Spatial Perception and Reasoning in Vision‑Language Models via Geometric Surrogate Tasks"
☆22Updated 3 weeks ago
Alternatives and similar repositories for Euclids_Gift
Users that are interested in Euclids_Gift are comparing it to the libraries listed below
Sorting:
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆79Updated 4 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆194Updated 3 months ago
- ☆52Updated this week
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆98Updated 4 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆212Updated this week
- ☆60Updated 3 weeks ago
- ☆39Updated 2 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆29Updated 5 months ago
- ☆104Updated 4 months ago
- Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"☆78Updated 2 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 7 months ago
- ☆140Updated this week
- Code for paper: Reinforced Vision Perception with Tools☆62Updated last month
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Models☆72Updated 9 months ago
- ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models☆66Updated 6 months ago
- Visual Planning: Let's Think Only with Images☆281Updated 6 months ago
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆88Updated 2 months ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆33Updated 8 months ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆251Updated 3 weeks ago
- The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs☆113Updated 4 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆62Updated 5 months ago
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆58Updated 5 months ago
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆75Updated this week
- [EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models☆39Updated 2 weeks ago
- Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward☆43Updated this week
- ☆128Updated 8 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆231Updated 3 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆92Updated 5 months ago
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".☆70Updated 4 months ago
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"☆70Updated 3 weeks ago