QwenLM/Qwen3-ASR-Toolkit

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/QwenLM/Qwen3-ASR-Toolkit)

QwenLM / Qwen3-ASR-Toolkit

Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support.

☆981

Alternatives and similar repositories for Qwen3-ASR-Toolkit

Users that are interested in Qwen3-ASR-Toolkit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

QwenLM / Qwen3-ASR
View on GitHub
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music…
☆3,214Jun 26, 2026Updated 3 weeks ago
QwenLM / Qwen3-Omni
View on GitHub
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…
☆3,904Apr 23, 2026Updated 3 months ago
XiaomiMiMo / MiMo-Audio
View on GitHub
MiMo-Audio: Audio Language Models are Few-Shot Learners
☆1,066Jun 17, 2026Updated last month
QwenAudio / Fun-ASR
View on GitHub
Open-source LLM-based ASR model family for Chinese, dialect, accent, and multilingual speech, with FunASR, vLLM, streaming, and llama.cpp…
☆1,425Updated this week
zai-org / GLM-ASR
View on GitHub
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
☆836Mar 6, 2026Updated 4 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
QwenLM / Qwen3Guard
View on GitHub
Qwen3Guard is a multilingual guardrail model series developed by the Qwen team at Alibaba Cloud.
☆487Oct 21, 2025Updated 9 months ago
inclusionAI / Ming-UniAudio
View on GitHub
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
☆450Nov 27, 2025Updated 7 months ago
FireRedTeam / FireRedASR2S
View on GitHub
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/…
☆614Jun 2, 2026Updated last month
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,940Feb 25, 2026Updated 5 months ago
stepfun-ai / Step-Audio2
View on GitHub
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…
☆1,487Mar 16, 2026Updated 4 months ago
facebookresearch / omnilingual-asr
View on GitHub
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
☆2,859Dec 30, 2025Updated 6 months ago
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Jul 17, 2026Updated last week
meituan-longcat / LongCat-Audio-Codec
View on GitHub
LongCat Audio Tokenizer and Detokenizer
☆301May 9, 2026Updated 2 months ago
MoonshotAI / Kimi-Audio-Evalkit
View on GitHub
☆169Nov 20, 2025Updated 8 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
stepfun-ai / Step-Audio-R1
View on GitHub
☆690Apr 29, 2026Updated 2 months ago
yuekaizhang / Fun-ASR-vllm
View on GitHub
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
☆107Jul 7, 2026Updated 2 weeks ago
allenai / OLMoASR
View on GitHub
An open-source implementation of Whisper
☆492Oct 29, 2025Updated 8 months ago
modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,459Updated this week
XiaomiMiMo / MiMo-V2.5-ASR
View on GitHub
Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios
☆317Apr 23, 2026Updated 3 months ago
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆1,018Dec 2, 2025Updated 7 months ago
QwenAudio / Fun-Audio-Chat
View on GitHub
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
☆985Feb 27, 2026Updated 4 months ago
Soul-AILab / SoulX-Duplug
View on GitHub
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
☆275Jul 17, 2026Updated last week
QwenAudio / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,935Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
zai-org / GLM-TTS
View on GitHub
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
☆1,044Apr 10, 2026Updated 3 months ago
modelscope / ClearerVoice-Studio
View on GitHub
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…
☆4,329Aug 14, 2025Updated 11 months ago
meituan-longcat / LongCat-Flash-Omni
View on GitHub
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
☆499May 9, 2026Updated 2 months ago
yeahhe365 / ASR-Studio
View on GitHub
Multi-provider ASR web studio (Qwen, Doubao, Gemini, NIM, OpenAI-compatible & more) with recording, batch queue, PWA, local cache, and be…
☆273Jul 16, 2026Updated last week
MrSupW / ContextASR-Bench
View on GitHub
A Massive Contextual Speech Recognition Benchmark.
☆107Aug 6, 2025Updated 11 months ago
FireRedTeam / FireRedVAD
View on GitHub
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, F…
☆472May 6, 2026Updated 2 months ago
QwenLM / Qwen2-Audio
View on GitHub
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
☆2,093Apr 21, 2025Updated last year
lmxue / Audio-FLAN
View on GitHub
Audio-FLAN
☆161Sep 23, 2025Updated 10 months ago
TEN-framework / ten-vad
View on GitHub
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
☆2,204Feb 2, 2026Updated 5 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
QwenAudio / CV3-Eval
View on GitHub
☆187Aug 25, 2025Updated 11 months ago
TencentARC / AudioStory
View on GitHub
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
☆301Sep 21, 2025Updated 10 months ago
halsay / ASR-TTS-paper-daily
View on GitHub
Update ASR paper everyday
☆513May 16, 2026Updated 2 months ago
OpenMOSS / MOSS-Audio-Tokenizer
View on GitHub
MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, i…
☆248Jun 16, 2026Updated last month
xzf-thu / Voices-in-the-Wild-Bench
View on GitHub
☆28May 22, 2026Updated 2 months ago
ASLP-lab / VoiceSculptor
View on GitHub
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
☆250Feb 26, 2026Updated 4 months ago
QwenAudio / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,382May 25, 2026Updated last month