HumanMLLM/R1-Omni

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HumanMLLM/R1-Omni)

HumanMLLM / R1-Omni

☆1,020

Alternatives and similar repositories for R1-Omni

Users that are interested in R1-Omni are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HumanMLLM / HumanOmni
View on GitHub
HumanOmni
☆240Mar 10, 2025Updated last year
HumanMLLM / HumanOmniV2
View on GitHub
☆161Jul 31, 2025Updated 11 months ago
StarsfieldAI / R1-V
View on GitHub
Witness the aha moment of VLM with less than $3.
☆4,065May 19, 2025Updated last year
QwenLM / Qwen2.5-Omni
View on GitHub
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…
☆4,043Jun 12, 2025Updated last year
zeroQiaoba / AffectGPT
View on GitHub
EMER, OV-MER (ICML25), AffectGPT (ICML25, Oral), EmoPrefer (ICLR26)
☆410Feb 24, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Liuziyu77 / Visual-RFT
View on GitHub
Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’
☆2,264Oct 29, 2025Updated 8 months ago
VITA-MLLM / VITA
View on GitHub
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
☆2,520Mar 28, 2025Updated last year
ZebangCheng / Emotion-LLaMA
View on GitHub
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
☆604Updated this week
om-ai-lab / VLM-R1
View on GitHub
Solve Visual Understanding with Reinforced VLMs
☆6,013Jul 7, 2026Updated 2 weeks ago
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆881Dec 14, 2025Updated 7 months ago
turningpoint-ai / VisualThinker-R1-Zero
View on GitHub
Explore the Multimodal “Aha Moment” on 2B Model
☆624Mar 18, 2025Updated last year
QwenLM / Qwen2-Audio
View on GitHub
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
☆2,092Apr 21, 2025Updated last year
OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,099Sep 22, 2025Updated 10 months ago
HumanMLLM / CoGenAV
View on GitHub
☆64Jul 1, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
HumanMLLM / ViSpeak
View on GitHub
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
☆53Jul 1, 2025Updated last year
Fancy-MLLM / R1-Onevision
View on GitHub
R1-onevision, a visual language model capable of deep CoT reasoning.
☆581Apr 13, 2025Updated last year
Wang-Xiaodong1899 / Open-R1-Video
View on GitHub
✨First Open-Source R1-like Video-LLM [2025/02/18]
☆382Jul 1, 2026Updated 3 weeks ago
QwenLM / Qwen3-VL
View on GitHub
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
☆19,645Jan 30, 2026Updated 5 months ago
stepfun-ai / Step-Audio
View on GitHub
☆34Mar 16, 2026Updated 4 months ago
HumanMLLM / Omni-Emotion
View on GitHub
☆22Jan 17, 2025Updated last year
zeroQiaoba / gpt4v-emotion
View on GitHub
GPT-4V with Emotion
☆97Dec 8, 2023Updated 2 years ago
zai-org / GLM-4-Voice
View on GitHub
GLM-4-Voice | 端到端中英语音对话模型
☆3,208Dec 5, 2024Updated last year
EvolvingLMMs-Lab / open-r1-multimodal
View on GitHub
A fork to add multimodal model training to open-r1
☆1,591Feb 8, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Jiaxing-star / LLaVA-Octopus
View on GitHub
☆11Jan 8, 2025Updated last year
zeroQiaoba / MERTools
View on GitHub
Toolkits for Multimodal Emotion Recognition
☆325Jun 5, 2026Updated last month
ModalMinds / MM-EUREKA
View on GitHub
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆770Sep 7, 2025Updated 10 months ago
baaivision / Emu3
View on GitHub
Next-Token Prediction is All You Need
☆2,432Jan 12, 2026Updated 6 months ago
hiyouga / EasyR1
View on GitHub
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆5,079Jul 15, 2026Updated last week
MoonshotAI / Kimi-VL
View on GitHub
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
☆1,206Jul 15, 2025Updated last year
modelscope / ms-swift
View on GitHub
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…
☆14,887Updated this week
huggingface / open-r1
View on GitHub
Fully open reproduction of DeepSeek-R1
☆26,412Apr 2, 2026Updated 3 months ago
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,711Jun 15, 2026Updated last month
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,583Jun 14, 2025Updated last year
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆25,965Jun 25, 2026Updated 3 weeks ago
QwenLM / Qwen3-Omni
View on GitHub
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…
☆3,903Apr 23, 2026Updated 3 months ago
gpt-omni / mini-omni
View on GitHub
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…
☆3,563Nov 5, 2024Updated last year
aimmemotion / EmoVIT
View on GitHub
[CVPR 2024] EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
☆40Apr 20, 2025Updated last year
UCSC-VLAA / VLAA-Thinking
View on GitHub
[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆148Oct 10, 2025Updated 9 months ago
DAMO-NLP-SG / VideoLLaMA2
View on GitHub
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
☆1,304Jan 23, 2025Updated last year