jishengpeng/ControlSpeech

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jishengpeng/ControlSpeech)

jishengpeng / ControlSpeech

[ACL 2025 Main] ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

☆276

Alternatives and similar repositories for ControlSpeech

Users that are interested in ControlSpeech are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jishengpeng / Languagecodec
View on GitHub
[ACL 2025 Oral] Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models
☆208Jun 25, 2025Updated last year
jishengpeng / TextrolSpeech
View on GitHub
[ICASSP 2024] TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models
☆187Nov 22, 2024Updated last year
line / LibriTTS-P
View on GitHub
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
☆161Jun 13, 2024Updated 2 years ago
yangdongchao / SimpleSpeech
View on GitHub
The open source code for SimpleSpeech series
☆147Oct 8, 2024Updated last year
line / promptttspp
View on GitHub
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions
☆86Oct 11, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Plachtaa / FAcodec
View on GitHub
Training code for FAcodec presented in NaturalSpeech3
☆244Aug 26, 2024Updated last year
jishengpeng / WavTokenizer
View on GitHub
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
☆1,310Mar 2, 2025Updated last year
winddori2002 / DEX-TTS
View on GitHub
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
☆108Jan 17, 2025Updated last year
hayeong0 / DDDM-VC
View on GitHub
Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for V…
☆244Jul 31, 2024Updated last year
walker-hyf / GPT-Talker
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆78Nov 1, 2024Updated last year
thuhcsi / VoxInstruct
View on GitHub
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
☆100Nov 9, 2024Updated last year
yangdongchao / LLM-Codec
View on GitHub
The open source code for LLM-Codec
☆147Aug 18, 2024Updated last year
X-LANCE / VoiceFlow-TTS
View on GitHub
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
☆376Sep 3, 2024Updated last year
p0p4k / Matcha-TTS-2
View on GitHub
E2E TTS using Conditional Flow Matching (Experimental*)
☆71Nov 10, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ajd12342 / paraspeechcaps
View on GitHub
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
☆165Mar 26, 2026Updated 4 months ago
X-E-Speech / X-E-Speech-code
View on GitHub
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
☆112Apr 1, 2024Updated 2 years ago
thuhcsi / SpeechCraft
View on GitHub
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆198Feb 28, 2026Updated 5 months ago
zhenye234 / X-Codec-2.0
View on GitHub
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
☆361Jun 25, 2026Updated last month
LSimon95 / megatts2
View on GitHub
Unoffical implementation of Megatts2
☆285Mar 23, 2024Updated 2 years ago
zhenye234 / xcodec
View on GitHub
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
☆308Oct 12, 2025Updated 9 months ago
KdaiP / StableTTS
View on GitHub
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
☆437Sep 13, 2024Updated last year
Choddeok / EmoSphere-TTS
View on GitHub
[INTERSPEECH 2024] The official implementation of EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for …
☆182Jul 16, 2026Updated 2 weeks ago
zhenye234 / FlashSpeech
View on GitHub
ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis
☆156Sep 20, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
glory20h / VoiceLDM
View on GitHub
VoiceLDM: Text-to-Speech with Environmental Context
☆194Aug 9, 2024Updated last year
WangHelin1997 / SSR-Speech
View on GitHub
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
☆154Jan 1, 2025Updated last year
yangdongchao / ALMTokenizer
View on GitHub
The demo page for ALMTokenizer
☆59Apr 14, 2025Updated last year
hayeong0 / Diff-HierVC
View on GitHub
Official Pytorch Implementation of "Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Pr…
☆237Jul 3, 2024Updated 2 years ago
voidful / Codec-SUPERB
View on GitHub
Audio Codec Speech processing Universal PERformance Benchmark
☆308Jul 4, 2026Updated 3 weeks ago
yzGuu830 / efficient-speech-codec
View on GitHub
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆126Mar 20, 2025Updated last year
sh-lee-prml / PeriodWave
View on GitHub
The official Implementation of PeriodWave and PeriodWave-Turbo
☆226Apr 14, 2025Updated last year
lifeiteng / naturalspeech3_facodec
View on GitHub
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
☆254Apr 20, 2024Updated 2 years ago
liutaocode / TTS-arxiv-daily
View on GitHub
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
☆663Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Audio-Foundation-Models / ConversationTTS
View on GitHub
☆101Jan 19, 2026Updated 6 months ago
IDEA-Emdoor-Lab / DistilCodec
View on GitHub
A Neural Audio Codec (NAC) for Universal Audio
☆47May 30, 2025Updated last year
declare-lab / HyperTTS
View on GitHub
☆40Apr 15, 2024Updated 2 years ago
Choddeok / EmoSpherepp
View on GitHub
[TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vec…
☆129Jul 16, 2026Updated 2 weeks ago
mct10 / RepCodec
View on GitHub
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
☆196Jul 12, 2024Updated 2 years ago
y-ren16 / TiCodec
View on GitHub
☆81Aug 11, 2025Updated 11 months ago
adelacvg / ttts
View on GitHub
Train the next generation of TTS systems.
☆169Sep 13, 2024Updated last year