HumanMLLM / R1-Omni
☆743Updated this week
Alternatives and similar repositories for R1-Omni:
Users that are interested in R1-Omni are comparing it to the libraries listed below
- ✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction☆2,167Updated last month
- Explore the Multimodal “Aha Moment” on 2B Model☆524Updated last week
- Frontier Multimodal Foundation Models for Image and Video Understanding☆664Updated this week
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆838Updated last month
- [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.☆1,117Updated last week
- Codebase for Aria - an Open Multimodal Native MoE☆1,025Updated 2 months ago
- An Open Large Reasoning Model for Real-World Solutions☆1,475Updated 3 weeks ago
- ☆1,347Updated 4 months ago
- Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。☆1,688Updated 2 months ago
- R1-onevision, a visual language model capable of deep CoT reasoning.☆464Updated last week
- ☆130Updated last month
- An open-sourced end-to-end VLM-based GUI Agent☆837Updated last month
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,618Updated 7 months ago
- ☆218Updated last month
- Muon is Scalable for LLM Training☆974Updated 3 weeks ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,687Updated 2 weeks ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,481Updated this week
- A fork to add multimodal model training to open-r1☆1,108Updated last month
- Scalable RL solution for advanced reasoning of language models☆1,419Updated last week
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆1,681Updated this week
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆294Updated 2 months ago
- Large Reasoning Models☆799Updated 3 months ago
- Official Repo for Open-Reasoner-Zero☆1,667Updated 3 weeks ago
- HumanOmni☆129Updated 2 weeks ago
- Parsing-free RAG supported by VLMs☆636Updated last month
- "VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos"☆490Updated 3 weeks ago
- ☆518Updated this week
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆767Updated this week
- ☆365Updated 3 weeks ago
- Witness the aha moment of VLM with less than $3.☆3,376Updated 3 weeks ago