[NeurIPS 2025] Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
☆114Dec 3, 2025Updated 2 months ago
Alternatives and similar repositories for Omni-R1
Users that are interested in Omni-R1 are comparing it to the libraries listed below
Sorting:
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆78Nov 17, 2025Updated 3 months ago
- [ICLR 2025 Spotlight] Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions☆43Mar 10, 2025Updated 11 months ago
- One-shot and Few-shot 3D Editing without Per-Scene Optimization☆164Aug 21, 2025Updated 6 months ago
- ☆13May 17, 2025Updated 9 months ago
- ☆12Mar 22, 2025Updated 11 months ago
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 4 months ago
- Official codes for the paper "GARDO: Reinforcing Diffusion Models without Reward Hacking"☆55Feb 2, 2026Updated last month
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆46Nov 25, 2025Updated 3 months ago
- [2026 AAAI] Think Before You Segment: An Object-aware Reasoning Agent for Referring Audio-Visual Segmentation☆19Nov 8, 2025Updated 3 months ago
- ☆14Apr 25, 2025Updated 10 months ago
- ☆15May 30, 2024Updated last year
- [ICLR 2024] Official PyTorch/Diffusers implementation of "Object-aware Inversion and Reassembly for Image Editing"☆88Aug 23, 2024Updated last year
- ☆23Jul 20, 2025Updated 7 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆41Jan 26, 2026Updated last month
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆19Jul 1, 2025Updated 8 months ago
- ☆16Apr 4, 2025Updated 10 months ago
- Official implementation for "Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts"☆22Jun 28, 2025Updated 8 months ago
- ☆21Feb 29, 2024Updated 2 years ago
- This is the official code base of AgentNetTool in OpenCUA. Website: https://opencua.xlang.ai/☆39Sep 3, 2025Updated 5 months ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆29Jan 23, 2024Updated 2 years ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆182Jun 5, 2025Updated 8 months ago
- Structured Video Comprehension of Real-World Shorts☆231Sep 21, 2025Updated 5 months ago
- [ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models☆219Jan 24, 2025Updated last year
- [ICLR'25] Official PyTorch implementation of "Framer: Interactive Frame Interpolation".☆502Jan 9, 2025Updated last year
- [CVPR 2024] Official PyTorch implementation of FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition☆176Sep 1, 2025Updated 6 months ago
- SFT+RL boosts multimodal reasoning☆46Jun 27, 2025Updated 8 months ago
- SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward☆92Aug 8, 2025Updated 6 months ago
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers☆25Feb 26, 2025Updated last year
- The code of 'Towards Domain-agnostic depth completion'☆27Aug 4, 2022Updated 3 years ago
- [CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆65Updated this week
- Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)☆44Nov 24, 2025Updated 3 months ago
- 📚 A collection of resources and papers on Large Language Models in autonomous driving☆27Oct 30, 2023Updated 2 years ago
- SurfaceSplat: Connecting Surface Reconstruction and Gaussian Splatting☆56Jul 21, 2025Updated 7 months ago
- Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving (ICCV 2025)☆36May 29, 2025Updated 9 months ago
- ☆70Oct 19, 2023Updated 2 years ago
- [CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens☆242Aug 2, 2025Updated 7 months ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆45Jul 22, 2025Updated 7 months ago
- [ICML 2024] Floating Anchor Diffusion Model for Multi-motif Scaffolding☆31Aug 23, 2024Updated last year
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆630Feb 12, 2026Updated 2 weeks ago