aim-uofa / Omni-R1Links

[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

☆92

Alternatives and similar repositories for Omni-R1

Users that are interested in Omni-R1 are comparing it to the libraries listed below

Sorting:

zhang9302002 / ThinkingWithVideos
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆64Updated last month
OuyangKun10 / SpaceR
SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning
☆98Updated 4 months ago
Haochen-Wang409 / ross
[ICLR'25] Reconstructive Visual Instruction Tuning
☆127Updated 7 months ago
cambrian-mllm / cambrian-s
Cambrian-S: Towards Spatial Supersensing in Video
☆375Updated 2 weeks ago
aim-uofa / Active-o3
ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
☆75Updated last week
Haochen-Wang409 / ross3d
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
☆62Updated 4 months ago
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆251Updated 3 weeks ago
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆194Updated 3 months ago
InternLM / Spatial-SSRL
Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
☆75Updated this week
TencentARC / TokLIP
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆231Updated 3 months ago
PzySeere / MetaSpatial
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …
☆193Updated 6 months ago
AMAP-ML / UniVG-R1
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
☆150Updated 5 months ago
showlab / VideoLISA
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
☆141Updated 11 months ago
Go2Heart / StreamFormer
[ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.
☆66Updated last week
TencentARC / Video-Holmes
Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆77Updated 4 months ago
yu-rp / VisualPerceptionToken
☆128Updated 8 months ago
GLUS-video / GLUS
[CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…
☆62Updated 5 months ago
Wakals / CoVT
☆52Updated this week
Gabesarch / grounded-rl
☆104Updated 4 months ago
NVlabs / VideoITG
☆89Updated 2 months ago
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆128Updated 4 months ago
zhouyiks / CoLVA
☆40Updated 4 months ago
hanghuacs / FineCaption
☆37Updated 5 months ago
Karine-Huang / GenMAC
☆30Updated 11 months ago
Cooperx521 / ScaleCap
Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
☆57Updated 5 months ago
diankun-wu / Spatial-MLLM
Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆387Updated 5 months ago
egolife-ai / Ego-R1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆130Updated 3 months ago
aim-uofa / dLLM-MidTruth
☆55Updated 3 months ago
cvlab-kaist / VIRAL
Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".
☆136Updated 2 months ago
arctanxarc / UniCTokens
A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…
☆124Updated last month