modelscope/FunASR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/modelscope/FunASR)

modelscope / FunASR

Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenAI-compatible/MCP serving.

☆19,256

Alternatives and similar repositories for FunASR

Users that are interested in FunASR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FunAudioLLM / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,864Updated this week
FunAudioLLM / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,188May 25, 2026Updated last month
fishaudio / fish-speech
View on GitHub
SOTA Open Source TTS
☆31,273Jun 9, 2026Updated last month
2noise / ChatTTS
View on GitHub
A generative speech model for daily dialogue.
☆39,619Apr 10, 2026Updated 3 months ago
wenet-e2e / wenet
View on GitHub
Production First and Production Ready End-to-End Speech Recognition Toolkit
☆5,170Jun 15, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
modelscope / 3D-Speaker
View on GitHub
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
☆3,048Dec 8, 2025Updated 7 months ago
k2-fsa / sherpa-onnx
View on GitHub
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…
☆13,580Updated this week
PaddlePaddle / PaddleSpeech
View on GitHub
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text fronten…
☆12,645Jun 21, 2026Updated 3 weeks ago
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,935Feb 25, 2026Updated 4 months ago
RVC-Boss / GPT-SoVITS
View on GitHub
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
☆59,806Updated this week
snakers4 / silero-vad
View on GitHub
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
☆9,591Jul 3, 2026Updated 2 weeks ago
modelscope / FunClip
View on GitHub
FunASR-powered video transcription, subtitle generation, and LLM-assisted clipping tool with a local Gradio UI.
☆5,922Updated this week
SYSTRAN / faster-whisper
View on GitHub
Faster Whisper transcription with CTranslate2
☆24,302Nov 19, 2025Updated 7 months ago
openai / whisper
View on GitHub
Robust Speech Recognition via Large-Scale Weak Supervision
☆104,995Apr 15, 2026Updated 3 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
lipku / LiveTalking
View on GitHub
Real time interactive streaming digital human
☆8,410Jul 5, 2026Updated last week
langgenius / dify
View on GitHub
Production-ready platform for agentic workflow development.
☆148,938Updated this week
chatchat-space / Langchain-Chatchat
View on GitHub
Langchain-Chatchat（原Langchain-ChatGLM）基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain…
☆38,426Nov 10, 2025Updated 8 months ago
hiyouga / LlamaFactory
View on GitHub
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
☆73,301Updated this week
coqui-ai / TTS
View on GitHub
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
☆45,756Aug 16, 2024Updated last year
labring / FastGPT
View on GitHub
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data process…
☆28,980Updated this week
zai-org / GLM-4-Voice
View on GitHub
GLM-4-Voice | 端到端中英语音对话模型
☆3,204Dec 5, 2024Updated last year
modelscope / ClearerVoice-Studio
View on GitHub
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…
☆4,304Aug 14, 2025Updated 11 months ago
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆25,891Jun 25, 2026Updated 3 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
netease-youdao / EmotiVoice
View on GitHub
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
☆8,487Aug 13, 2024Updated last year
rany2 / edge-tts
View on GitHub
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
☆11,505Mar 22, 2026Updated 3 months ago
QwenLM / Qwen-Audio
View on GitHub
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
☆1,909Jul 5, 2024Updated 2 years ago
SWivid / F5-TTS
View on GitHub
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
☆14,952Jul 5, 2026Updated last week
open-mmlab / Amphion
View on GitHub
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…
☆9,931Mar 25, 2026Updated 3 months ago
speechbrain / speechbrain
View on GitHub
A PyTorch-based Speech Toolkit
☆11,685Jun 15, 2026Updated last month
stepfun-ai / Step-Audio
View on GitHub
☆31Mar 16, 2026Updated 4 months ago
TMElyralab / MuseTalk
View on GitHub
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
☆6,172Sep 26, 2025Updated 9 months ago
xorbitsai / inference
View on GitHub
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-p…
☆9,433Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
m-bain / whisperX
View on GitHub
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
☆23,084Updated this week
harry0703 / AudioNotes
View on GitHub
快速提取音视频内容，整理成一份结构化的markdown笔记
☆2,186Updated this week
netease-youdao / QAnything
View on GitHub
Question and Answer based on Anything.
☆14,032Mar 24, 2025Updated last year
infiniflow / ragflow
View on GitHub
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to creat…
☆85,109Updated this week
pyannote / pyannote-audio
View on GitHub
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker…
☆10,281Updated this week
espnet / espnet
View on GitHub
End-to-End Speech Processing Toolkit
☆9,890Updated this week
songquanpeng / one-api
View on GitHub
LLM API 管理 & 分发系统，支持 OpenAI、Azure、Anthropic Claude、Google Gemini、DeepSeek、字节豆包、ChatGLM、文心一言、讯飞星火、通义千问、360 智脑、腾讯混元等主流模型，统一 API 适配，可用于 key …
☆35,743Jan 9, 2026Updated 6 months ago