thorstenMueller / Audio-to-Voice-Dataset
Create an LJSpeech structured voice dataset on wave input
β27Updated 6 months ago
Alternatives and similar repositories for Audio-to-Voice-Dataset:
Users that are interested in Audio-to-Voice-Dataset are comparing it to the libraries listed below
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.β61Updated 3 weeks ago
- πΌ Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decompositionβ16Updated last year
- Google's SoundStorm: Efficient Parallel Audio Generationβ131Updated last year
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.β135Updated last year
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPβ¦β95Updated 5 months ago
- β84Updated this week
- VoiceBox neural network implementationβ105Updated 8 months ago
- β104Updated this week
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.β81Updated last year
- Use quantized versions of Whisper to speed up inferenceβ12Updated 5 months ago
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.β32Updated this week
- Open TTS models, built for streaming on the edgeβ39Updated 2 weeks ago
- Collection of Open Source Speech Dataβ152Updated 4 months ago
- Trying to build an all in one speech-text language model - a bit like GPT-4oβ22Updated 10 months ago
- π Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. π§π₯π Advanced audio processing.β243Updated 9 months ago
- An unofficial PyTorch implementation of VALL-Eβ87Updated this week
- Zero-shot Audio Classification using Whisperβ80Updated 2 years ago
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordingsβ¦β82Updated 2 months ago
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusionβ173Updated 6 months ago
- VALL-E 2 reproductionβ123Updated 8 months ago
- Speaker change detection using SincNet and an LSTM/Transformerβ48Updated 9 months ago
- Audio tokenization, in the fastest way possible!β49Updated 7 months ago
- Speaker Diarization with Transformersβ64Updated 10 months ago
- A Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTSβ36Updated 3 months ago
- Scripts for computing the Intelligibility and CLVP scores for evaluating TTS modelsβ152Updated last year
- Efficient approach to speaker diarization using voice characteristics extractionβ93Updated 11 months ago
- PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.β166Updated last year
- A TTS model that makes a speaker speak new languagesβ76Updated 9 months ago
- Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of codeβ147Updated 11 months ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,β¦β68Updated 6 months ago