aim-uofa / Active-o3Links

ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

☆74

Alternatives and similar repositories for Active-o3

Users that are interested in Active-o3 are comparing it to the libraries listed below

Sorting:

multimodal-reasoning-lab / Bagel-Zebra-CoT
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆101Updated 2 weeks ago
yix8 / VisualPlanning
Visual Planning: Let's Think Only with Images
☆280Updated 6 months ago
aim-uofa / Omni-R1
[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
☆90Updated 5 months ago
TencentARC / SEED-Bench-R1
☆94Updated 4 months ago
zhijie-group / R1-Zero-VSI
☆41Updated 5 months ago
marinero4972 / Open-o3-Video
Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆116Updated last week
Open-Reasoner-Zero / Open-Vision-Reasoner
[NeurIPS 2025] The official repository for our paper, "Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reason…
☆144Updated 2 months ago
cambrian-mllm / cambrian-s
Cambrian-S: Towards Spatial Supersensing in Video
☆333Updated last week
hustvl / MaTVLM
☆53Updated 6 months ago
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆191Updated 3 months ago
OuyangKun10 / SpaceR
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆97Updated 4 months ago
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆167Updated last week
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆77Updated 4 months ago
Gabesarch / grounded-rl
☆104Updated 3 months ago
thuml / MiniVeo3-Reasoner
Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…
☆181Updated last month
Tiezheng11 / Vision-Language-Vision
☆62Updated 4 months ago
AntResearchNLP / ViLaSR
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆79Updated 3 months ago
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆93Updated 8 months ago
EvolvingLMMs-Lab / MGPO
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆51Updated 3 months ago
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆250Updated 2 weeks ago
Yangr116 / VST
Visual Spatial Tuning
☆133Updated this week
lxtGH / DenseWorld-1M
Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"
☆114Updated last month
OpenGVLab / VeBrain
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
☆85Updated 5 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆61Updated 4 months ago
PzySeere / MetaSpatial
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …
☆194Updated 6 months ago
Haochen-Wang409 / TreeVGR
Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"
☆70Updated 2 weeks ago
yunlong10 / Awesome-Video-LMM-Post-Training
🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training
☆169Updated 3 weeks ago
InternLM / Spatial-SSRL
Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
☆71Updated this week
egolife-ai / Ego-R1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆127Updated 3 months ago
penghao-wu / visual_jigsaw
☆58Updated 2 weeks ago