aim-uofa / Omni-R1Links
Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
☆69Updated last month
Alternatives and similar repositories for Omni-R1
Users that are interested in Omni-R1 are comparing it to the libraries listed below
Sorting:
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆62Updated last month
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆36Updated 2 months ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆156Updated 2 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆285Updated 3 weeks ago
- Official implementation of "Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness".☆35Updated 3 weeks ago
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆141Updated last month
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆67Updated last week
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆133Updated last month
- UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation☆38Updated 2 weeks ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆126Updated last month
- OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding☆41Updated this week
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆50Updated 3 weeks ago
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆177Updated 3 months ago
- A list of works on video generation towards world model☆157Updated last week
- Official implementation for "Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts"☆17Updated 2 weeks ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆193Updated 2 months ago
- A paper list for spatial reasoning☆119Updated last month
- ☆30Updated 7 months ago
- [ICCV 2025] A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World☆281Updated this week
- A collection of vision foundation models unifying understanding and generation.☆56Updated 6 months ago
- Unified Vision-Language-Action Model☆128Updated 2 weeks ago
- ☆128Updated 2 weeks ago
- ☆86Updated 3 weeks ago
- Long-RL: Scaling RL to Long Sequences☆323Updated this week
- UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning☆123Updated last month
- [ICLR'25] Reconstructive Visual Instruction Tuning☆97Updated 3 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆45Updated 3 weeks ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆126Updated 6 months ago
- ☆69Updated 2 weeks ago
- Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"☆263Updated 2 months ago