ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets
β137Aug 10, 2025Updated 10 months ago
Alternatives and similar repositories for TTSizer
Users that are interested in TTSizer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Try to replicate the architecture of MiniMaxTTS mentioned in it's technical reportβ47Sep 2, 2025Updated 9 months ago
- β101Jan 19, 2026Updated 4 months ago
- β41Jul 15, 2025Updated 10 months ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into oneβ26Aug 5, 2024Updated last year
- Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesisβ27Mar 21, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- poorman's ar-dit ttsβ45Dec 31, 2025Updated 5 months ago
- High quality text-to-speech based on StyleTTS 2.β77Apr 6, 2026Updated 2 months ago
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matchingβ45Feb 9, 2025Updated last year
- β25Mar 6, 2024Updated 2 years ago
- β23Feb 14, 2026Updated 3 months ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'β161Mar 26, 2026Updated 2 months ago
- Official implementation of "Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis",β¦β80May 29, 2023Updated 3 years ago
- FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.β250Feb 25, 2026Updated 3 months ago
- The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transformeβ¦β45Sep 5, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"β217Sep 19, 2024Updated last year
- β15Nov 11, 2024Updated last year
- GPT-style network for phonemization with durations of textβ69Mar 21, 2024Updated 2 years ago
- [ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"β374Sep 3, 2024Updated last year
- This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in β¦β57Aug 9, 2025Updated 10 months ago
- The demo page for ALMTokenizerβ59Apr 14, 2025Updated last year
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMsβ57Nov 19, 2025Updated 6 months ago
- β19Mar 22, 2024Updated 2 years ago
- NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates [WIP]β25Jul 5, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- β26Sep 22, 2022Updated 3 years ago
- All generative model in one for better TTS modelβ74Sep 8, 2024Updated last year
- Train the next generation of TTS systems.β170Sep 13, 2024Updated last year
- [NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matchingβ126Apr 8, 2026Updated 2 months ago
- β12Nov 7, 2024Updated last year
- text to speechβ10Mar 19, 2024Updated 2 years ago
- β18Feb 9, 2020Updated 6 years ago
- Parallel waveform generation with DiffusionGANβ17Mar 26, 2022Updated 4 years ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech β¦β28Nov 7, 2025Updated 7 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" accβ¦β77Jul 16, 2023Updated 2 years ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDβ¦β198Jan 25, 2026Updated 4 months ago
- Inference code for Interspeech 2025 paper, "LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec"β36Oct 23, 2025Updated 7 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modellingβ100Nov 9, 2024Updated last year
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open β¦β23May 19, 2026Updated 3 weeks ago
- GPT for FACodecβ13Mar 25, 2024Updated 2 years ago
- β36Sep 6, 2025Updated 9 months ago