serp-ai/ai-text-to-audio-latent-diffusion

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/serp-ai/ai-text-to-audio-latent-diffusion)

serp-ai / ai-text-to-audio-latent-diffusion

text-to-audio-latent-diffusion

☆36

Alternatives and similar repositories for ai-text-to-audio-latent-diffusion

Users that are interested in ai-text-to-audio-latent-diffusion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cpii-cai / PunCantonese
View on GitHub
A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts
☆15Dec 3, 2024Updated last year
KrishnaDN / BERTphone
View on GitHub
Implementation of the paper "BERTphone: Phonetically-aware Encoder Representations for Utterance-level Speaker and Language Recognition"
☆17Dec 10, 2020Updated 5 years ago
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
hanghuacs / MMComposition
View on GitHub
☆17Jun 20, 2025Updated last year
Infinity-INF / fast-phasr
View on GitHub
Phonemes and durations labeling based on whisper small
☆11Jul 7, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
diontimmer / sample-diffusion-gui
View on GitHub
GUI toolkit using various audio diffusion repos.
☆76Jul 27, 2023Updated 2 years ago
NeuralNotW0rk / LoRAW
View on GitHub
Flexible LoRA Implementation to use with stable-audio-tools
☆84Sep 9, 2024Updated last year
angel-01 / image-to-scene
View on GitHub
☆10Apr 10, 2021Updated 5 years ago
jing-bi / awesome-M.LLM-reasoning
View on GitHub
☆20May 11, 2025Updated last year
p1an-lin-jung / wv_tts
View on GitHub
☆19Mar 22, 2024Updated 2 years ago
vinesmsuic / ipainter-diffusion
View on GitHub
Official Code for "Intelligent Painter: Picture Composition With Resampling Diffusion Model" (ICIP 2023)
☆16Jun 23, 2023Updated 3 years ago
shengcanxu / canoSpeech
View on GitHub
text to speech
☆10Mar 19, 2024Updated 2 years ago
shashankshirol / GeneratingNoisySpeechData
View on GitHub
A repository comprising of code for generation of noisy speech data from clean data using deep learning methods
☆16Jul 12, 2021Updated 5 years ago
sushant-t / tts-trainer
View on GitHub
Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…
☆30May 27, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
CODEJIN / VITS_Diffusion
View on GitHub
☆26Sep 22, 2022Updated 3 years ago
ZehuaKcrissLi / GTR-Voice
View on GitHub
☆16Nov 11, 2024Updated last year
lifeiteng / TTS-TextAnalyzer
View on GitHub
TTS Text Analyzer
☆31Jul 20, 2023Updated 3 years ago
ionite34 / h2p-parser
View on GitHub
Heteronym to Phoneme Parser
☆19Nov 4, 2023Updated 2 years ago
CookiePPP / podcast_rss_feeds
View on GitHub
List of Podcast Feeds using iTunes API and script to download 6,000,000~ hours of English speech.
☆31Apr 13, 2023Updated 3 years ago
karchkha / MelSpec_GPT_VQVAE
View on GitHub
Audio Generation model working with GPT-2 and VQVAE compressed representation of MelSpectrograms
☆18Oct 8, 2023Updated 2 years ago
yunlong10 / VidComposition
View on GitHub
[CVPR 2025] VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
☆30May 10, 2025Updated last year
Top34051 / stargan-zsvc
View on GitHub
Unofficial PyTorch Implementation of StarGAN-ZSVC
☆14Aug 5, 2021Updated 4 years ago
fakerybakery / OpenF5-TTS
View on GitHub
(WIP) A retrain of F5-TTS on permissively-licensed data
☆14Apr 6, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
CODEJIN / XiaoiceSing2
View on GitHub
☆19Feb 2, 2023Updated 3 years ago
flamed-tts / Flamed-TTS
View on GitHub
This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in …
☆57Aug 9, 2025Updated 11 months ago
jagilley / autodrummer
View on GitHub
A text-to-audio model for generating text-conditioned drum beats
☆21Apr 25, 2023Updated 3 years ago
mshahbazi72 / NeRF-GAN-Distillation
View on GitHub
☆20Mar 29, 2023Updated 3 years ago
gteu / realtime-ppg-vc
View on GitHub
Voice conversion model for real-time speech synthesis using PPG (Phonetic PosteriorGram) as an intermediate feature, written in Pytorch.
☆29Mar 3, 2022Updated 4 years ago
choiHkk / Transformer-TTS-V2
View on GitHub
☆25Mar 6, 2024Updated 2 years ago
declare-lab / HyperTTS
View on GitHub
☆40Apr 15, 2024Updated 2 years ago
yunlong10 / Video-R4
View on GitHub
Reinforcing Text-Rich Video Reasoning with Visual Rumination
☆28Jun 5, 2026Updated last month
WikiChao / DAVIS
View on GitHub
[🏆 IJCV 2025 & ACCV 2024 Best Paper Honorable Mention] Official pytorch implementation of the paper "High-Quality Visually-Guided Sound …
☆33Mar 30, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hecko-yes / tts-dataset-prompts
View on GitHub
Finally, some decent sample sentences
☆24Dec 3, 2023Updated 2 years ago
WikiChao / FreSca
View on GitHub
[CVPR 2025 GMCV] Test-Time Frequency Scaling: Instant Frequency Control for Any Diffusion Model
☆55May 31, 2025Updated last year
drscotthawley / aeiou
View on GitHub
(ML) audio engineering i/o utils
☆55Mar 31, 2025Updated last year
ilaria-manco / word2wave
View on GitHub
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.
☆118Dec 13, 2021Updated 4 years ago
Scarfmonster / HiFiPLN
View on GitHub
Multispeaker Community Vocoder Model for DiffSinger
☆39Aug 11, 2025Updated 11 months ago
zhai-lw / L3AC
View on GitHub
A lightweight audio codec based on a single quantizer
☆35Sep 4, 2025Updated 10 months ago
tonnetonne814 / PL-Bert-VITS2
View on GitHub
VITS2 using Phoneme-Level Japanese BERT
☆14Dec 17, 2023Updated 2 years ago