yliu-cs / SSRLinks
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
☆17Updated 2 months ago
Alternatives and similar repositories for SSR
Users that are interested in SSR are comparing it to the libraries listed below
Sorting:
- ☆17Updated 8 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆36Updated 8 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆29Updated 8 months ago
- ☆28Updated 9 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆77Updated last year
- ☆45Updated 7 months ago
- ☆16Updated 2 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆42Updated 3 months ago
- ☆71Updated 8 months ago
- A Holistic Embodied Cognition Benchmark☆17Updated 4 months ago
- Official Repository of LatentSeek☆56Updated 2 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆68Updated 2 months ago
- ☆12Updated 7 months ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆13Updated 3 weeks ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆33Updated 9 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆35Updated last month
- Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape. This work is accepted by ICCV 2025.☆27Updated last month
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆46Updated last month
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆77Updated 2 months ago
- ☆72Updated 2 weeks ago
- Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images☆13Updated 2 months ago
- ☆21Updated 9 months ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning", https://arxiv.org/abs/2505.13934☆74Updated 2 months ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆21Updated 2 months ago
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆20Updated 9 months ago
- Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"☆27Updated 2 weeks ago
- 🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆30Updated 2 weeks ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated 3 weeks ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆17Updated last month
- ☆18Updated last year