QwenLM/Qwen3-ASR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/QwenLM/Qwen3-ASR)

QwenLM / Qwen3-ASR

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

☆3,214

Alternatives and similar repositories for Qwen3-ASR

Users that are interested in Qwen3-ASR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

QwenAudio / Fun-ASR
View on GitHub
Open-source LLM-based ASR model family for Chinese, dialect, accent, and multilingual speech, with FunASR, vLLM, streaming, and llama.cpp…
☆1,425Updated this week
FireRedTeam / FireRedASR2S
View on GitHub
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/…
☆614Jun 2, 2026Updated last month
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,940Feb 25, 2026Updated 5 months ago
QwenLM / Qwen3-TTS
View on GitHub
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streamin…
☆12,590Mar 17, 2026Updated 4 months ago
QwenLM / Qwen3-ASR-Toolkit
View on GitHub
Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…
☆981Feb 5, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,459Updated this week
zai-org / GLM-ASR
View on GitHub
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
☆836Mar 6, 2026Updated 4 months ago
QwenAudio / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,935Updated this week
yuekaizhang / Fun-ASR-vllm
View on GitHub
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
☆107Jul 7, 2026Updated 2 weeks ago
QwenLM / Qwen3-Omni
View on GitHub
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…
☆3,904Apr 23, 2026Updated 3 months ago
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Jul 17, 2026Updated last week
facebookresearch / omnilingual-asr
View on GitHub
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
☆2,859Dec 30, 2025Updated 6 months ago
stepfun-ai / Step-Audio2
View on GitHub
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…
☆1,487Mar 16, 2026Updated 4 months ago
XiaomiMiMo / MiMo-V2.5-ASR
View on GitHub
Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios
☆317Apr 23, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
QwenAudio / Fun-Audio-Chat
View on GitHub
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
☆985Feb 27, 2026Updated 4 months ago
XiaomiMiMo / MiMo-Audio
View on GitHub
MiMo-Audio: Audio Language Models are Few-Shot Learners
☆1,066Jun 17, 2026Updated last month
zai-org / GLM-TTS
View on GitHub
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
☆1,044Apr 10, 2026Updated 3 months ago
QwenAudio / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,382May 25, 2026Updated last month
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆1,018Dec 2, 2025Updated 7 months ago
snakers4 / silero-vad
View on GitHub
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
☆9,661Jul 16, 2026Updated last week
Soul-AILab / SoulX-Duplug
View on GitHub
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
☆275Jul 17, 2026Updated last week
inclusionAI / Ming-UniAudio
View on GitHub
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
☆450Nov 27, 2025Updated 7 months ago
FireRedTeam / FireRedVAD
View on GitHub
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, F…
☆472May 6, 2026Updated 2 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
xingchensong / S3Tokenizer
View on GitHub
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
☆521Dec 22, 2025Updated 7 months ago
Soul-AILab / SoulX-Transcriber
View on GitHub
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
☆284Jun 22, 2026Updated last month
modelscope / ClearerVoice-Studio
View on GitHub
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…
☆4,329Aug 14, 2025Updated 11 months ago
TEN-framework / ten-vad
View on GitHub
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
☆2,204Feb 2, 2026Updated 5 months ago
modelscope / 3D-Speaker
View on GitHub
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
☆3,069Dec 8, 2025Updated 7 months ago
ASLP-lab / Easy-Turn
View on GitHub
Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems
☆122Jan 25, 2026Updated 6 months ago
ASLP-lab / WenetSpeech-Yue
View on GitHub
A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation
☆344Jun 6, 2026Updated last month
ASLP-lab / VoiceSculptor
View on GitHub
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
☆250Feb 26, 2026Updated 4 months ago
Quantatirsk / qwen3-asr
View on GitHub
All in one Qwen3-ASR Server, compatible with OpenAI API
☆320Jul 14, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
xingchensong / FlashCosyVoice
View on GitHub
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
☆250Feb 25, 2026Updated 5 months ago
k2-fsa / sherpa-onnx
View on GitHub
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…
☆13,767Updated this week
k2-fsa / Flow2GAN
View on GitHub
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
☆145Mar 8, 2026Updated 4 months ago
inclusionAI / Ming-omni-tts
View on GitHub
Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control
☆263Feb 26, 2026Updated 4 months ago
Gilgamesh-J / X-ASR
View on GitHub
X-ASR is a series of automatic speech recognition models based on the icefall framework, focusing on streaming ASR and low-latency deploy…
☆145Jul 8, 2026Updated 2 weeks ago
stepfun-ai / Step-Audio-R1
View on GitHub
☆690Apr 29, 2026Updated 2 months ago
wenet-e2e / wespeaker
View on GitHub
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
☆1,365Jul 8, 2026Updated 2 weeks ago