freds0 / katube
KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a list of YouTube playlists or YouTube channels, KATube will generate dataset with audios and texts.
☆23Updated 9 months ago
Alternatives and similar repositories for katube:
Users that are interested in katube are comparing it to the libraries listed below
- ☆26Updated last year
- PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS☆23Updated 3 years ago
- Finetuning VITS Efficiently☆32Updated last year
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated last year
- (WIP) A retrain of F5-TTS on permissively-licensed data☆11Updated last month
- Adaptive Vocoder for Custom Voice☆59Updated 2 years ago
- ☆41Updated last year
- ☆56Updated 2 years ago
- CML-TTS: A Multilingual Dataset for Speech Synthesis☆31Updated 9 months ago
- Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing☆70Updated 2 years ago
- PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supp…☆48Updated last year
- A collection of all our phonemeizers for dataset construction and inference☆22Updated 2 months ago
- Finally, some decent sample sentences☆22Updated last year
- NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment☆16Updated 3 years ago
- Heteronym to Phoneme Parser☆18Updated last year
- [IJCAI'23] Learning to Speak from Text for Low-Resource TTS☆63Updated last year
- Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.☆29Updated 2 years ago
- Convert English text from written expressions into spoken forms☆25Updated 2 years ago
- Just another FastSpeech 2 but cleaner code :)☆26Updated 10 months ago
- Create training data for training a voice cloner for bark text to speech.☆44Updated last year
- High quality text-to-speech based on StyleTTS 2.☆37Updated this week
- ☆69Updated last year
- SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs☆15Updated last year
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆28Updated last year
- ☆24Updated last year
- A simple voice conversion tool☆17Updated 3 years ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆95Updated 6 months ago
- ☆20Updated 2 years ago
- pytorch implementation for MultiSpeech: Multi-Speaker Text to Speech with Transformer paper☆21Updated 2 years ago
- NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates [WIP]☆24Updated 2 years ago