yliu-cs / SSRLinks
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
☆15Updated last week
Alternatives and similar repositories for SSR
Users that are interested in SSR are comparing it to the libraries listed below
Sorting:
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆51Updated 2 weeks ago
- Official implementation of MC-LLaVA.☆28Updated this week
- ☆69Updated 6 months ago
- ☆36Updated last month
- ☆12Updated 4 months ago
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆29Updated 6 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆35Updated 6 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆64Updated this week
- VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models☆16Updated last week
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆34Updated 6 months ago
- ☆13Updated 3 weeks ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆39Updated last month
- Evaluate Multimodal LLMs as Embodied Agents☆49Updated 3 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆34Updated last month
- ☆43Updated 5 months ago
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆19Updated 3 months ago
- The official implement of "Grounded Chain-of-Thought for Multimodal Large Language Models"☆11Updated 2 months ago
- ☆26Updated 6 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Updated 8 months ago
- ☆21Updated 7 months ago
- HAZARD challenge☆34Updated last month
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆31Updated 5 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆22Updated 4 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆58Updated 2 months ago
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆35Updated last month
- ☆12Updated 6 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆23Updated last month
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆29Updated 2 months ago
- Official code for MotionBench (CVPR 2025)☆40Updated 3 months ago
- Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆28Updated 3 weeks ago