HumanMLLM / R1-Omni
☆832Updated 3 weeks ago
Alternatives and similar repositories for R1-Omni:
Users that are interested in R1-Omni are comparing it to the libraries listed below
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆685Updated this week
- Frontier Multimodal Foundation Models for Image and Video Understanding☆741Updated this week
- ✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,225Updated 3 weeks ago
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆2,556Updated this week
- Explore the Multimodal “Aha Moment” on 2B Model☆572Updated last month
- Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。☆1,717Updated 3 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆500Updated last week
- A fork to add multimodal model training to open-r1☆1,212Updated 2 months ago
- ☆223Updated last month
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆886Updated 3 weeks ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,680Updated 8 months ago
- HumanOmni☆152Updated last month
- MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning☆556Updated this week
- ☆662Updated this week
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,746Updated 2 weeks ago
- Scalable RL solution for advanced reasoning of language models☆1,488Updated last month
- An open-sourced end-to-end VLM-based GUI Agent☆904Updated 2 weeks ago
- An Open Large Reasoning Model for Real-World Solutions☆1,484Updated last month
- ☆1,351Updated 4 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,595Updated this week
- GPT-4o-level, real-time spoken dialogue system.☆314Updated 2 months ago
- Muon is Scalable for LLM Training☆1,022Updated 3 weeks ago
- Next-Token Prediction is All You Need☆2,076Updated last month
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,141Updated last week
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆446Updated last week
- "VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos"☆594Updated 3 weeks ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆297Updated last month
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆450Updated last month
- ☆659Updated this week
- Large Reasoning Models☆802Updated 4 months ago