aim-uofa / Omni-R1Links
Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
☆51Updated this week
Alternatives and similar repositories for Omni-R1
Users that are interested in Omni-R1 are comparing it to the libraries listed below
Sorting:
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆58Updated last week
- [ArXiv 2025] WorldMem: Long-term Consistent World Simulation with Memory☆152Updated last week
- Official implementation for WorldScore: A Unified Evaluation Benchmark for World Generation☆105Updated last month
- A comprehensive list of papers investigating physical cognition in video generation, including papers, codes, and related websites.☆106Updated last week
- Aether: Geometric-Aware Unified World Modeling☆326Updated this week
- [CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.☆108Updated this week
- [ARXIV’25] Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control☆57Updated this week
- ☆42Updated last week
- GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography☆62Updated this week
- DeepVerse: 4D Autoregressive Video Generation as a World Model☆63Updated this week
- ☆129Updated 5 months ago
- Code release for "PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop" (ICML 2025)☆32Updated 3 weeks ago
- A list of works on video generation towards world model☆101Updated last week
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆101Updated last week
- [ICLR 2025] Layout-Your-3D: Controllable and Precise 3D Generation with 2D Blueprint☆11Updated 3 months ago
- [arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence☆17Updated last week
- Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs☆38Updated 2 weeks ago
- Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆117Updated 2 weeks ago
- MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, …☆127Updated last month
- [CVPR 2025] Science-T2I: Addressing Scientific Illusions in Image Synthesis☆55Updated last month
- [CVPR 2025 (Oral)] Open implementation of "RandAR"☆155Updated 2 months ago
- Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis (ECCV 2024 Oral) - Official Implementation☆253Updated 7 months ago
- Official implementation of EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance☆24Updated last week
- Generative World Explorer☆143Updated 6 months ago
- ☆30Updated 6 months ago
- Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆155Updated last month
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆84Updated last week
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM☆15Updated 2 weeks ago
- An open-source lightweight game generation paradigm. It includes everything from data processing to model architecture design and playabi…☆88Updated 4 months ago
- Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence☆99Updated last week