QwenLM / Qwen3-OmniLinks
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
☆3,206Updated 3 months ago
Alternatives and similar repositories for Qwen3-Omni
Users that are interested in Qwen3-Omni are comparing it to the libraries listed below
Sorting:
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆3,868Updated 7 months ago
- GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning☆2,112Updated 3 weeks ago
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.☆3,032Updated 6 months ago
- ☆989Updated 9 months ago
- Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…☆1,515Updated 6 months ago
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆1,135Updated 5 months ago
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆944Updated 3 months ago
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,290Updated 3 months ago
- Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages☆2,556Updated last week
- MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining☆1,896Updated 7 months ago
- A framework for efficient model inference with omni-modality models☆1,977Updated last week
- The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trai…☆2,841Updated last week
- Open-source unified multimodal model☆5,539Updated 2 months ago
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆3,290Updated 6 months ago
- [CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents☆1,889Updated 3 months ago
- Muon is Scalable for LLM Training☆1,397Updated 5 months ago
- ☆856Updated 3 months ago
- MiMo-VL☆619Updated 4 months ago
- ☆1,257Updated last month
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Models☆1,549Updated last month
- ☆1,405Updated last month
- TurboDiffusion: 100–200× Acceleration for Video Diffusion Models☆3,087Updated last week
- GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models☆3,678Updated 2 weeks ago
- Tencent Hunyuan A13B (short as Hunyuan-A13B), an innovative and open-source LLM built on a fine-grained MoE architecture.☆810Updated 6 months ago
- ☆1,696Updated 3 months ago
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆1,427Updated 3 months ago
- ☆1,194Updated 2 months ago
- Renderer for the harmony response format to be used with gpt-oss☆4,124Updated 3 weeks ago
- A Scientific Multimodal Foundation Model☆623Updated 3 months ago
- ☆1,260Updated 3 weeks ago