modelscope/ClearerVoice-Studio

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/modelscope/ClearerVoice-Studio)

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

☆4,314

Alternatives and similar repositories for ClearerVoice-Studio

Users that are interested in ClearerVoice-Studio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Xiaobin-Rong / gtcrn
View on GitHub
The official implementation of GTCRN, an ultra-lightweight SE model.
☆692Jan 18, 2026Updated 6 months ago
FunAudioLLM / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,265May 25, 2026Updated last month
wenet-e2e / wesep
View on GitHub
Target Speaker Extraction Toolkit
☆298Oct 4, 2025Updated 9 months ago
modelscope / 3D-Speaker
View on GitHub
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
☆3,055Dec 8, 2025Updated 7 months ago
gemelo-ai / vocos
View on GitHub
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
☆1,142Aug 7, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
KdaiP / StableTTS
View on GitHub
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
☆438Sep 13, 2024Updated last year
FireRedTeam / FireRedTTS
View on GitHub
An Open-Sourced LLM-empowered Foundation TTS System
☆907Sep 28, 2025Updated 9 months ago
modelscope / FunCodec
View on GitHub
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…
☆445Jan 25, 2024Updated 2 years ago
open-mmlab / Amphion
View on GitHub
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…
☆9,957Mar 25, 2026Updated 3 months ago
FunAudioLLM / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,888Updated this week
FunAudioLLM / FunMusic
View on GitHub
A fundamental toolkit designed for music, song, and audio generation
☆1,369May 20, 2025Updated last year
shivammehta25 / Matcha-TTS
View on GitHub
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
☆1,332Updated this week
resemble-ai / resemble-enhance
View on GitHub
AI powered speech denoising and enhancement
☆2,363Dec 3, 2024Updated last year
modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,333Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
xingchensong / S3Tokenizer
View on GitHub
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
☆517Dec 22, 2025Updated 6 months ago
wenet-e2e / wespeaker
View on GitHub
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
☆1,358Jul 8, 2026Updated last week
zai-org / GLM-4-Voice
View on GitHub
GLM-4-Voice | 端到端中英语音对话模型
☆3,204Dec 5, 2024Updated last year
SWivid / F5-TTS
View on GitHub
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
☆14,978Jul 5, 2026Updated 2 weeks ago
X-LANCE / VoiceFlow-TTS
View on GitHub
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
☆375Sep 3, 2024Updated last year
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆1,014Dec 2, 2025Updated 7 months ago
TEN-framework / ten-vad
View on GitHub
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
☆2,198Feb 2, 2026Updated 5 months ago
Plachtaa / seed-vc
View on GitHub
zero-shot voice conversion & singing voice conversion, with real-time support
☆3,878Apr 20, 2025Updated last year
stepfun-ai / Step-Audio
View on GitHub
☆32Mar 16, 2026Updated 4 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
yangdongchao / UniAudio
View on GitHub
The Open Source Code of UniAudio
☆606Jul 22, 2024Updated last year
sp-uhh / sgmse
View on GitHub
Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
☆764May 12, 2026Updated 2 months ago
MoonshotAI / Kimi-Audio
View on GitHub
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
☆4,677Jun 21, 2025Updated last year
yxlu-0102 / MP-SENet
View on GitHub
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
☆492May 19, 2025Updated last year
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,936Feb 25, 2026Updated 4 months ago
fishaudio / fish-speech
View on GitHub
SOTA Open Source TTS
☆31,327Jun 9, 2026Updated last month
snakers4 / silero-vad
View on GitHub
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
☆9,615Updated this week
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,047Jan 15, 2026Updated 6 months ago
ScottishFold007 / TTSAudioNormalizer
View on GitHub
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…
☆112Dec 20, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
jishengpeng / WavTokenizer
View on GitHub
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
☆1,305Mar 2, 2025Updated last year
ZhangXInFD / SpeechTokenizer
View on GitHub
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…
☆658Jun 9, 2024Updated 2 years ago
ddlBoJack / emotion2vec
View on GitHub
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training fo…
☆1,157Dec 23, 2024Updated last year
Rikorose / DeepFilterNet
View on GitHub
Noise supression using deep filtering
☆4,476Oct 17, 2024Updated last year
JusperLee / TIGER
View on GitHub
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
☆433Apr 20, 2026Updated 3 months ago
aliutkus / speechmetrics
View on GitHub
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
☆1,050Jul 5, 2023Updated 3 years ago
zhenye234 / X-Codec-2.0
View on GitHub
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
☆360Jun 25, 2026Updated 3 weeks ago