freds0/katube

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/freds0/katube)

freds0 / katube

KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a list of YouTube playlists or YouTube channels, KATube will generate dataset with audios and texts.

☆26

Alternatives and similar repositories for katube

Users that are interested in katube are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

miccio-dk / NISQA
View on GitHub
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
☆16Apr 13, 2022Updated 4 years ago
BridgetteSong / ExpressiveTacotron
View on GitHub
This repository provides a multi-mode and multi-speaker expressive speech synthesis framework, including multi-attentive Tacotron, DurIAN…
☆74Sep 21, 2022Updated 3 years ago
rishikksh20 / Avocodo-pytorch
View on GitHub
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
☆122Jul 14, 2022Updated 4 years ago
gpu-poor / gramvaani_hindi_asr
View on GitHub
This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge
☆16Mar 26, 2022Updated 4 years ago
ex3ndr / supervoice-hybrid
View on GitHub
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆26Aug 5, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
bryan051003 / USVG
View on GitHub
A unified model for zero-shot singing voice conversion and synthesis
☆22Nov 30, 2022Updated 3 years ago
ajinkyakulkarni14 / ERISHA
View on GitHub
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for…
☆44Dec 17, 2020Updated 5 years ago
lwang114 / UnsupTTS
View on GitHub
☆37Mar 26, 2024Updated 2 years ago
TanUkkii007 / deepvoice3-tensorflow
View on GitHub
A tensorflow based implementation of DeepVoice3 https://arxiv.org/abs/1710.07654
☆13Jun 5, 2018Updated 8 years ago
keonlee9420 / Daft-Exprt
View on GitHub
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis
☆55Oct 15, 2021Updated 4 years ago
MuyangDu / T5Voice
View on GitHub
T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …
☆28Nov 7, 2025Updated 8 months ago
rishikksh20 / iSTFT-Avocodo-pytorch
View on GitHub
Ultrafast GAN based Vocoder for Text to Speech
☆50Jul 16, 2022Updated 4 years ago
voidful / wav2vec2-xlsr-multilingual-56
View on GitHub
56 language, 1 model Multilingual ASR
☆25Jul 25, 2021Updated 4 years ago
neonbjb / BigListOfPodcasts
View on GitHub
A list of podcast URLs scraped from the Apple podcast database in late 2021, including a script for downloading those podcasts.
☆44Mar 9, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Labmem-Zhouyx / CDFSE_FastSpeech2
View on GitHub
The Official Implementation of “Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synth…
☆86Dec 20, 2022Updated 3 years ago
yuan1615 / AdaVocoder
View on GitHub
Adaptive Vocoder for Custom Voice
☆61Sep 22, 2022Updated 3 years ago
CODEJIN / VITS_Diffusion
View on GitHub
☆26Sep 22, 2022Updated 3 years ago
keonlee9420 / WaveGrad2
View on GitHub
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
☆68Aug 3, 2021Updated 4 years ago
cschaefer26 / StyleMelGAN
View on GitHub
☆10Apr 8, 2024Updated 2 years ago
coqui-ai / data-checker
View on GitHub
🫠 check your data, before you wreck your model
☆16Aug 11, 2022Updated 3 years ago
krylm / whisper-event-tuning
View on GitHub
Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.
☆12Dec 24, 2022Updated 3 years ago
MiniXC / LightningFastSpeech2
View on GitHub
☆55Jan 13, 2023Updated 3 years ago
speechio / asr-noises
View on GitHub
A handy dataset of noises for ASR
☆22May 29, 2019Updated 7 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
keonlee9420 / Comprehensive-Transformer-TTS
View on GitHub
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration…
☆328Sep 24, 2022Updated 3 years ago
freds0 / kabooks
View on GitHub
KABooks is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. Using a…
☆13Mar 24, 2023Updated 3 years ago
dipjyoti92 / SC-WaveRNN
View on GitHub
Official PyTorch implementation of Speaker Conditional WaveRNN
☆110Jun 22, 2022Updated 4 years ago
tts-tutorial / icassp2022
View on GitHub
☆64May 23, 2022Updated 4 years ago
igormq / speech2text
View on GitHub
☆12Feb 9, 2021Updated 5 years ago
MTG / PodcastMix-inference
View on GitHub
☆32Jan 6, 2022Updated 4 years ago
erogol / TTS_tf
View on GitHub
WIP Tensorflow implementation of https://github.com/mozilla/TTS
☆15Apr 11, 2020Updated 6 years ago
rendchevi / nix-tts
View on GitHub
🐤 Nix-TTS: Lightweight and End-to-end Text-to-Speech via Module-wise Distillation
☆264Nov 15, 2025Updated 8 months ago
audiodemo / voice-conversion
View on GitHub
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
☆17Aug 18, 2023Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
sotaque-brasileiro / sotaque-brasileiro
View on GitHub
Uma base de dados para estudo de regionalismos brasileiros através da voz.
☆11May 2, 2023Updated 3 years ago
Aria-K-Alethia / speaking-rate-controllable-hifi-gan
View on GitHub
☆16Apr 4, 2022Updated 4 years ago
egorsmkv / asr-corpus-creator
View on GitHub
This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.
☆27Feb 15, 2024Updated 2 years ago
supertone-inc / super-monotonic-align
View on GitHub
☆173Sep 19, 2024Updated last year
fanlu / wenet
View on GitHub
Transformer based ASR Engine.
☆13Aug 23, 2021Updated 4 years ago
ncsoft / avocodo
View on GitHub
Official implementation of "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder" (AAAI2023)
☆154Feb 1, 2023Updated 3 years ago
dobby-seo / Wav2Keyword
View on GitHub
Wav2Keyword is keyword spotting(KWS) based on Wav2Vec 2.0. This model shows state-of-the-art in Speech commands dataset V1 and V2.
☆110Jan 11, 2023Updated 3 years ago