Mddct/simple-tts

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Mddct/simple-tts)

Mddct / simple-tts

（WIP）long form speech generatoins

☆30

Alternatives and similar repositories for simple-tts

Users that are interested in simple-tts are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Mddct / cosyvoice2-flow-optimized
View on GitHub
faster inference
☆27Jan 20, 2025Updated last year
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
pengzhendong / audiolab
View on GitHub
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
☆39Mar 31, 2026Updated 3 months ago
lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated last year
Mddct / usm-tokenizer
View on GitHub
semantic tokenizer for speech and music
☆20Jul 6, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
MuyangDu / T5Voice
View on GitHub
T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …
☆28Nov 7, 2025Updated 8 months ago
pengzhendong / ngram-punctuator
View on GitHub
An N-gram punctuator for Chinese and English.
☆18Oct 14, 2025Updated 9 months ago
yangdongchao / RSTnet
View on GitHub
Real-time Speech-Text Foundation Model Toolkit (wip)
☆256Mar 26, 2025Updated last year
rishikksh20 / MiniMax-TTS-pytorch
View on GitHub
Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report
☆47Sep 2, 2025Updated 10 months ago
Anvarjon / Age-Gender-Classification
View on GitHub
Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…
☆28Mar 5, 2024Updated 2 years ago
pengzhendong / g2p-mix
View on GitHub
Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.
☆115Dec 2, 2025Updated 7 months ago
ScottishFold007 / TTSAudioNormalizer
View on GitHub
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…
☆112Dec 20, 2024Updated last year
ArenAcikgoz / Whisper-Alignment
View on GitHub
Forced alignment decoder for Whisper.
☆16Mar 13, 2024Updated 2 years ago
flamed-tts / Flamed-TTS
View on GitHub
This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in …
☆57Aug 9, 2025Updated 11 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
xkx-hub / KALL-E
View on GitHub
[AAAI 2026 oral] KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
☆42Sep 25, 2025Updated 9 months ago
YangXusheng-yxs / CodecFormer_5Hz
View on GitHub
☆35Oct 23, 2025Updated 8 months ago
luotianze666 / WaveFM
View on GitHub
[NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
☆133Apr 8, 2026Updated 3 months ago
walker-hyf / NCSSD
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆61Nov 1, 2024Updated last year
samsad35 / code-ancogen
View on GitHub
[ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
☆14Mar 11, 2025Updated last year
bfs18 / rfwave
View on GitHub
☆151Apr 25, 2025Updated last year
LAION-AI / emotion-annotations
View on GitHub
☆110Updated this week
parrot-tts / Parrot-TTS
View on GitHub
Official Code for ParrotTTS
☆58Oct 13, 2024Updated last year
ictnlp / SLED-TTS
View on GitHub
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
☆108May 20, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
pengzhendong / compute-wer
View on GitHub
Compute WER and SER for speech recognition evaluation
☆27Jun 6, 2026Updated last month
yukara-ikemiya / Open-Miipher-2
View on GitHub
PyTorch implementation of Miipher-2 [2025] which is a speech restoration model by Google DeepMind
☆70Sep 22, 2025Updated 9 months ago
Mddct / WeUSM
View on GitHub
☆13Mar 30, 2023Updated 3 years ago
walker-hyf / GPT-Talker
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆78Nov 1, 2024Updated last year
lucadellalib / discrete-wavlm-codec
View on GitHub
A neural speech codec based on discrete WavLM representations
☆26Aug 28, 2024Updated last year
ZhikangNiu / Semantic-VAE
View on GitHub
[INTERSPEECH 2026 Oral]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
☆120Jun 21, 2026Updated last month
uthree / ddsp-vocoder
View on GitHub
☆12Nov 7, 2024Updated last year
zhai-lw / SQCodec
View on GitHub
A lightweight audio codec based on a single quantizer
☆72Aug 15, 2025Updated 11 months ago
fishaudio / vocoder
View on GitHub
☆130Jul 6, 2026Updated 2 weeks ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ex3ndr / supervoice-hybrid
View on GitHub
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆26Aug 5, 2024Updated last year
zhai-lw / L3AC
View on GitHub
A lightweight audio codec based on a single quantizer
☆35Sep 4, 2025Updated 10 months ago
pengzhendong / audio-pipeline
View on GitHub
☆23Oct 17, 2024Updated last year
bfs18 / armel
View on GitHub
poorman's ar-dit tts
☆45Dec 31, 2025Updated 6 months ago
pengzhendong / speaker-diarization
View on GitHub
Offline Speaker Diarization with SenseVoice by Sherpa ONNX.
☆15Dec 23, 2024Updated last year
Takaaki-Saeki / DiscreteSpeechMetrics
View on GitHub
Reference-aware automatic speech evaluation toolkit
☆185Dec 5, 2024Updated last year
MorenoLaQuatra / vad
View on GitHub
Simple voice activity detection (VAD) algorithm in Python
☆15Aug 10, 2023Updated 2 years ago