InternLM / Spatial-SSRLLinks
Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
☆108Updated last month
Alternatives and similar repositories for Spatial-SSRL
Users that are interested in Spatial-SSRL are comparing it to the libraries listed below
Sorting:
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆137Updated 5 months ago
- [NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration☆113Updated last month
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆183Updated this week
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆267Updated 2 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated 2 months ago
- Cambrian-S: Towards Spatial Supersensing in Video☆482Updated last month
- ☆66Updated 2 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆74Updated last week
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆233Updated 5 months ago
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆58Updated this week
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆156Updated 7 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Updated 6 months ago
- A collection of awesome think with videos papers.☆83Updated last month
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆103Updated 6 months ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆90Updated 6 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆133Updated 9 months ago
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆79Updated 3 months ago
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆127Updated last month
- Official repository for the UAE paper, unified-GRPO, and unified-Bench☆156Updated 4 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆232Updated 2 months ago
- TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆97Updated this week
- TStar is a unified temporal search framework for long-form video question answering☆86Updated 4 months ago
- A collection of vision foundation models unifying understanding and generation.☆59Updated last year
- The code repository of UniRL☆51Updated 8 months ago
- ☆59Updated 5 months ago
- ☆132Updated 10 months ago
- ☆96Updated 7 months ago
- ☆47Updated last week
- ☆80Updated 7 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆145Updated last year