PaddlePaddle/PaddleSpeech

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/PaddlePaddle/PaddleSpeech)

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

☆12,645

Alternatives and similar repositories for PaddleSpeech

Users that are interested in PaddleSpeech are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,256Updated this week
wenet-e2e / wenet
View on GitHub
Production First and Production Ready End-to-End Speech Recognition Toolkit
☆5,170Jun 15, 2026Updated last month
espnet / espnet
View on GitHub
End-to-End Speech Processing Toolkit
☆9,890Updated this week
nl8590687 / ASRT_SpeechRecognition
View on GitHub
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
☆8,375Apr 10, 2026Updated 3 months ago
coqui-ai / TTS
View on GitHub
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
☆45,756Aug 16, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
speechbrain / speechbrain
View on GitHub
A PyTorch-based Speech Toolkit
☆11,685Jun 15, 2026Updated last month
kaldi-asr / kaldi
View on GitHub
kaldi-asr/kaldi is the official location of the Kaldi project.
☆15,428Sep 22, 2025Updated 9 months ago
babysor / MockingBird
View on GitHub
🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time
☆36,922Mar 3, 2026Updated 4 months ago
mozilla / DeepSpeech
View on GitHub
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Ras…
☆26,770Jun 19, 2025Updated last year
openai / whisper
View on GitHub
Robust Speech Recognition via Large-Scale Weak Supervision
☆104,995Apr 15, 2026Updated 3 months ago
FunAudioLLM / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,188May 25, 2026Updated last month
jaywalnut310 / vits
View on GitHub
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
☆7,882Dec 6, 2023Updated 2 years ago
PaddlePaddle / PaddleOCR
View on GitHub
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…
☆85,555Updated this week
2noise / ChatTTS
View on GitHub
A generative speech model for daily dialogue.
☆39,619Apr 10, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
suno-ai / bark
View on GitHub
🔊 Text-Prompted Generative Audio Model
☆39,197Aug 19, 2024Updated last year
FunAudioLLM / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,864Updated this week
TensorSpeech / TensorFlowTTS
View on GitHub
TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, Germa…
☆3,991Jul 5, 2024Updated 2 years ago
zai-org / ChatGLM-6B
View on GitHub
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
☆41,024Jun 27, 2024Updated 2 years ago
fishaudio / fish-speech
View on GitHub
SOTA Open Source TTS
☆31,273Jun 9, 2026Updated last month
CorentinJ / Real-Time-Voice-Cloning
View on GitHub
Clone a voice in 5 seconds to generate arbitrary speech in real-time
☆60,028Mar 9, 2026Updated 4 months ago
RVC-Boss / GPT-SoVITS
View on GitHub
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
☆59,806Updated this week
yeyupiaoling / PPASR
View on GitHub
基于PaddlePaddle实现端到端中文语音识别，从入门到实战，超简单的入门案例，超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
☆873Dec 17, 2025Updated 7 months ago
netease-youdao / EmotiVoice
View on GitHub
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
☆8,487Aug 13, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
facebookresearch / fairseq
View on GitHub
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
☆32,240Sep 30, 2025Updated 9 months ago
PaddlePaddle / PaddleNLP
View on GitHub
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
☆12,954May 23, 2026Updated last month
mozilla / TTS
View on GitHub
Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
☆10,161Nov 9, 2023Updated 2 years ago
OpenTalker / SadTalker
View on GitHub
[CVPR 2023] SadTalker：Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
☆13,952Jun 26, 2024Updated 2 years ago
NVIDIA-NeMo / Speech
View on GitHub
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Auto…
☆17,774Updated this week
Rudrabha / Wav2Lip
View on GitHub
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Mult…
☆13,096Jun 22, 2025Updated last year
PlayVoice / vits_chinese
View on GitHub
Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!
☆1,231Feb 5, 2024Updated 2 years ago
chatchat-space / Langchain-Chatchat
View on GitHub
Langchain-Chatchat（原Langchain-ChatGLM）基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain…
☆38,426Nov 10, 2025Updated 8 months ago
ming024 / FastSpeech2
View on GitHub
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
☆2,185Oct 27, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
svc-develop-team / so-vits-svc
View on GitHub
SoftVC VITS Singing Voice Conversion
☆28,146Nov 11, 2023Updated 2 years ago
snakers4 / silero-vad
View on GitHub
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
☆9,591Jul 3, 2026Updated 2 weeks ago
JiehangXie / PaddleBoBo
View on GitHub
基于飞桨开发的虚拟主播
☆1,062Mar 12, 2023Updated 3 years ago
hpcaitech / ColossalAI
View on GitHub
Making large AI models cheaper, faster and more accessible
☆41,413Updated this week
SYSTRAN / faster-whisper
View on GitHub
Faster Whisper transcription with CTranslate2
☆24,302Nov 19, 2025Updated 7 months ago
PaddlePaddle / Parakeet
View on GitHub
PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Paralle…
☆623Nov 19, 2021Updated 4 years ago
fishaudio / Bert-VITS2
View on GitHub
vits2 backbone with multilingual-bert
☆8,773Updated this week