yliu-cs / SSRLinks
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
☆18Updated 3 months ago
Alternatives and similar repositories for SSR
Users that are interested in SSR are comparing it to the libraries listed below
Sorting:
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆38Updated 9 months ago
- ☆19Updated 3 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆29Updated 9 months ago
- ☆18Updated 8 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆72Updated 2 months ago
- ☆71Updated 8 months ago
- ☆45Updated 8 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆28Updated 4 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆42Updated 3 months ago
- ☆12Updated 7 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆79Updated last year
- ☆79Updated last month
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆17Updated 2 months ago
- Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models☆67Updated last month
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆33Updated 9 months ago
- Official Repository of LatentSeek☆60Updated 2 months ago
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆46Updated last month
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆68Updated last month
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆65Updated 2 months ago
- ☆21Updated 9 months ago
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"☆60Updated last month
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning", https://arxiv.org/abs/2505.13934☆79Updated 2 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆33Updated 3 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆20Updated 6 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆27Updated 3 weeks ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated 2 weeks ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆78Updated 3 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆38Updated 2 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆29Updated last month
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆13Updated last month