zhijie-group / R1-Zero-VSILinks

☆42

Alternatives and similar repositories for R1-Zero-VSI

Users that are interested in R1-Zero-VSI are comparing it to the libraries listed below

Sorting:

MINT-SJTU / STI-Bench
STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
☆33Updated 5 months ago
yliu-cs / SSR
[NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
☆36Updated 2 months ago
Gabesarch / grounded-rl
☆112Updated 5 months ago
OuyangKun10 / SpaceR
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆101Updated 5 months ago
qizekun / OmniSpatial
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
☆76Updated 3 months ago
TencentARC / SEED-Bench-R1
☆96Updated 6 months ago
OpenGVLab / VeBrain
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
☆87Updated 6 months ago
physical-superintelligence-lab / PhysBench
[ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …
☆82Updated 7 months ago
EvolvingLMMs-Lab / MGPO
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆51Updated 5 months ago
AntResearchNLP / ViLaSR
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆84Updated 5 months ago
OpenSenseNova / SenseNova-SI
Scaling Spatial Intelligence with Multimodal Foundation Models
☆143Updated this week
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆84Updated 5 months ago
2toinf / IVM
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
☆39Updated last year
TencentARC / DSR_Suite
☆35Updated last week
PzySeere / MetaSpatial
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …
☆198Updated 7 months ago
om-ai-lab / ZoomEye
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆69Updated last month
ChenYi99 / EgoPlan
[IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
☆78Updated last year
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆218Updated 5 months ago
TencentARC / GRPO-CARE
☆80Updated 6 months ago
aim-uofa / Active-o3
ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
☆76Updated last month
penghao-wu / visual_jigsaw
☆65Updated last month
mll-lab-nu / MindCube
☆115Updated 2 months ago
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆72Updated last month
alanaai / EVUD
Egocentric Video Understanding Dataset (EVUD)
☆32Updated last year
marinero4972 / Open-o3-Video
Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆127Updated 2 weeks ago
Haochen-Wang409 / ross3d
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
☆63Updated 5 months ago
SHI-Labs / Slow-Fast-Video-Multimodal-LLM
☆27Updated 8 months ago
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆94Updated 10 months ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆111Updated 2 months ago
KAIST-Visual-AI-Group / APC-VLM
[ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
☆49Updated 3 months ago