Awesome TTS
☆63Sep 16, 2021Updated 4 years ago
Alternatives and similar repositories for Awesome-Text-to-Speech-TTS
Users that are interested in Awesome-Text-to-Speech-TTS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Aug 19, 2024Updated last year
- Interface for Controllable Expressive Talking Machine☆40Sep 20, 2025Updated 8 months ago
- An implement of GlowTTS model. Several modes are added: speaker embedding, prosody encoder(GST), and gradient reversal.☆55Sep 14, 2022Updated 3 years ago
- Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis.☆90Mar 5, 2022Updated 4 years ago
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆43May 19, 2023Updated 3 years ago
- A Compact and Effective Pretrained Model for Speech Emotion Recognition☆54Apr 10, 2026Updated last month
- ☆23Jan 29, 2026Updated 3 months ago
- Awesome list of TTS papers with audio samples☆61Aug 18, 2020Updated 5 years ago
- Finetuning VITS Efficiently☆33Nov 6, 2023Updated 2 years ago
- 60k hours of phoneme-aligned audio from audio books☆19Jul 27, 2024Updated last year
- The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…☆66Dec 26, 2025Updated 5 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆16Jun 16, 2024Updated last year
- a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.☆38Apr 7, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.☆56Apr 14, 2025Updated last year
- Various Text-to-speech (TTS) papers based on Deep-learning☆14Feb 26, 2021Updated 5 years ago
- One command to start a streaming ASR server.☆12Oct 2, 2024Updated last year
- SC-CNN: Effective Speaker Conditioning Method for Zero-Shot Multi-Speaker Text-to-Speech Systems☆39Nov 1, 2023Updated 2 years ago
- 🎵 muse: Music Separation☆11Feb 14, 2024Updated 2 years ago
- A fundamental frequency estimation algorithm using features from the magnitude and phase spectrogram.☆24Mar 29, 2021Updated 5 years ago
- GE2E Speaker Encoder - Generalized End-To-End Loss for Speaker Verification☆14May 17, 2020Updated 6 years ago
- [ICASSP 2026] Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis☆39Dec 24, 2025Updated 5 months ago
- A curated list of full-duplex spoken dialogue models & benchmarks☆78May 20, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Glow-TTS with Stochastic Duration Predictor and Stochastic Pitch Predictor☆19Jun 5, 2023Updated 2 years ago
- Multispeaker Community Vocoder Model for DiffSinger☆38Aug 11, 2025Updated 9 months ago
- EMO-SUPERB submission☆51Oct 13, 2025Updated 7 months ago
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆24Oct 8, 2025Updated 7 months ago
- [Early Alpha] A unified framework for text-to-speech, voice conversion, automatic speech recognition, audio classification, voice activit…☆22Jan 10, 2025Updated last year
- ☆69Mar 31, 2021Updated 5 years ago
- Tacotron2 with Global Style Tokens☆64Apr 19, 2019Updated 7 years ago
- Predicts the level of noise and reverberation on your audiofiles☆186May 18, 2026Updated last week
- ☆40Nov 18, 2025Updated 6 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- acnn for text-independent speaker recognition☆10Feb 8, 2022Updated 4 years ago
- ☆12Mar 11, 2025Updated last year
- PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supp…☆48Jul 31, 2023Updated 2 years ago
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆66Jun 16, 2025Updated 11 months ago
- Official implementation of Meta-StyleSpeech and StyleSpeech☆252Feb 9, 2022Updated 4 years ago
- Python Implementation of Visual Relative Attributes for Image Classification and Zero Shot Learning