FunAudioLLM/SenseVoice

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FunAudioLLM/SenseVoice)

FunAudioLLM / SenseVoice

Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio event detection.

☆8,888

Alternatives and similar repositories for SenseVoice

Users that are interested in SenseVoice are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FunAudioLLM / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,265May 25, 2026Updated last month
modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,333Updated this week
modelscope / 3D-Speaker
View on GitHub
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
☆3,055Dec 8, 2025Updated 7 months ago
fishaudio / fish-speech
View on GitHub
SOTA Open Source TTS
☆31,327Jun 9, 2026Updated last month
pengzhendong / streaming-sensevoice
View on GitHub
Pseudo Streaming SenseVoice with Hotwords
☆465Jun 15, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
2noise / ChatTTS
View on GitHub
A generative speech model for daily dialogue.
☆39,651Apr 10, 2026Updated 3 months ago
QwenLM / Qwen2-Audio
View on GitHub
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
☆2,088Apr 21, 2025Updated last year
zai-org / GLM-4-Voice
View on GitHub
GLM-4-Voice | 端到端中英语音对话模型
☆3,204Dec 5, 2024Updated last year
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,936Feb 25, 2026Updated 4 months ago
k2-fsa / sherpa-onnx
View on GitHub
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…
☆13,649Updated this week
lovemefan / SenseVoice.cpp
View on GitHub
Port of Funasr's Sense-voice model in C/C++
☆567Dec 19, 2025Updated 7 months ago
RVC-Boss / GPT-SoVITS
View on GitHub
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
☆59,952Jul 13, 2026Updated last week
SWivid / F5-TTS
View on GitHub
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
☆14,978Jul 5, 2026Updated 2 weeks ago
snakers4 / silero-vad
View on GitHub
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
☆9,615Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SYSTRAN / faster-whisper
View on GitHub
Faster Whisper transcription with CTranslate2
☆24,375Nov 19, 2025Updated 8 months ago
stepfun-ai / Step-Audio
View on GitHub
☆32Mar 16, 2026Updated 4 months ago
gpt-omni / mini-omni
View on GitHub
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…
☆3,562Nov 5, 2024Updated last year
modelscope / ClearerVoice-Studio
View on GitHub
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…
☆4,314Aug 14, 2025Updated 11 months ago
QwenLM / Qwen-Audio
View on GitHub
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
☆1,913Jul 5, 2024Updated 2 years ago
FunAudioLLM / FunAudioLLM-APP
View on GitHub
☆384Jul 22, 2024Updated last year
antgroup / echomimic
View on GitHub
[AAAI 2025] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
☆4,269Apr 7, 2026Updated 3 months ago
0x5446 / api4sensevoice
View on GitHub
API and websocket server for sensevoice. It has inherited some enhanced features, such as VAD detection, real-time streaming recognition,…
☆538Oct 23, 2024Updated last year
netease-youdao / EmotiVoice
View on GitHub
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
☆8,489Aug 13, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
index-tts / index-tts
View on GitHub
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
☆21,991Updated this week
open-mmlab / Amphion
View on GitHub
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…
☆9,957Mar 25, 2026Updated 3 months ago
FunAudioLLM / Fun-ASR
View on GitHub
Open-source LLM-based ASR model family for Chinese, dialect, accent, and multilingual speech, with FunASR, vLLM, streaming, and llama.cpp…
☆1,410Updated this week
PaddlePaddle / PaddleSpeech
View on GitHub
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text fronten…
☆12,649Jun 21, 2026Updated 3 weeks ago
myshell-ai / MeloTTS
View on GitHub
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
☆7,541Dec 24, 2024Updated last year
lifeiteng / OmniSenseVoice
View on GitHub
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
☆897Dec 10, 2025Updated 7 months ago
wenet-e2e / wenet
View on GitHub
Production First and Production Ready End-to-End Speech Recognition Toolkit
☆5,170Jun 15, 2026Updated last month
ddlBoJack / emotion2vec
View on GitHub
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training fo…
☆1,157Dec 23, 2024Updated last year
MoonshotAI / Kimi-Audio
View on GitHub
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
☆4,677Jun 21, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
rany2 / edge-tts
View on GitHub
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
☆11,532Mar 22, 2026Updated 3 months ago
TMElyralab / MuseTalk
View on GitHub
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
☆6,194Sep 26, 2025Updated 9 months ago
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆25,930Jun 25, 2026Updated 3 weeks ago
SparkAudio / Spark-TTS
View on GitHub
Spark-TTS Inference Code
☆10,998Apr 9, 2025Updated last year
shivammehta25 / Matcha-TTS
View on GitHub
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
☆1,332Updated this week
bytedance / MegaTTS3
View on GitHub
☆6,083Jun 15, 2026Updated last month
BytedanceSpeech / seed-tts-eval
View on GitHub
☆1,574Jun 14, 2024Updated 2 years ago