QwenLM / Qwen3-OmniLinks

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

☆2,699

Alternatives and similar repositories for Qwen3-Omni

Users that are interested in Qwen3-Omni are comparing it to the libraries listed below

Sorting:

MiniMax-AI / MiniMax-M1
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
☆2,920Updated 3 months ago
XiaomiMiMo / MiMo
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
☆1,596Updated 4 months ago
zai-org / GLM-V
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
☆1,710Updated last week
QwenLM / Qwen2.5-Omni
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…
☆3,728Updated 4 months ago
HumanMLLM / R1-Omni
☆956Updated 6 months ago
ByteDance-Seed / Seed1.5-VL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,463Updated 4 months ago
MoonshotAI / Kimi-VL
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
☆1,071Updated 3 months ago
ByteDance-Seed / Bagel
Open-source unified multimodal model
☆5,160Updated 2 months ago
XiaomiMiMo / MiMo-Audio
MiMo-Audio: Audio Language Models are Few-Shot Learners
☆782Updated last month
MiniMax-AI / MiniMax-01
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
☆3,197Updated 3 months ago
Tencent-Hunyuan / Hunyuan-A13B
Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.
☆800Updated 3 months ago
ByteDance-Seed / seed-oss
☆828Updated last month
microsoft / Magma
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
☆1,827Updated 2 weeks ago
Gen-Verse / MMaDA
[NeurIPS 2025] MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
☆1,434Updated last week
MoonshotAI / Moonlight
Muon is Scalable for LLM Training
☆1,336Updated 2 months ago
XiaomiMiMo / MiMo-VL
MiMo-VL
☆570Updated 2 months ago
zai-org / GLM-4.5
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
☆3,055Updated last week
DreamLM / Dream
Dream 7B, a large diffusion language model
☆1,018Updated 3 weeks ago
facebookresearch / MILS
Code release for "LLMs can see and hear without any training"
☆452Updated 5 months ago
MoonshotAI / MoBA
MoBA: Mixture of Block Attention for Long-Context LLMs
☆1,941Updated 6 months ago
meituan-longcat / LongCat-Flash-Chat
☆1,160Updated this week
facebookresearch / spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
☆924Updated 11 months ago
Tencent-Hunyuan / Tencent-Hunyuan-Large
☆1,583Updated 10 months ago
ByteDance-Seed / m3-agent
☆1,033Updated 2 weeks ago
kyutai-labs / delayed-streams-modeling
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
☆2,468Updated last month
ML-GSAI / LLaDA
Official PyTorch implementation for "Large Language Diffusion Models"
☆3,079Updated last week
huggingface / smollm
Everything about the SmolLM and SmolVLM family of models
☆3,332Updated last month
AIDC-AI / Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
☆1,373Updated last month
SakanaAI / text-to-lora
Hypernetworks that adapt LLMs for specific benchmark tasks using only textual task description as the input
☆893Updated 4 months ago
rhymes-ai / Aria
Codebase for Aria - an Open Multimodal Native MoE
☆1,071Updated 9 months ago