rhasspy / espeak-phonemizer
Uses ctypes and libespeak-ng to transform test into IPA phonemes
☆20Updated last year
Related projects: ⓘ
- ☆75Updated 3 months ago
- This is the official repository for the HUI-Audio-Corpus-German. The corresponding paper is in the process of publication. With the repo…☆26Updated last year
- A sequence-to-sequence voice conversion toolkit.☆84Updated 2 months ago
- NVIDIA's FastPitch, extracted from the DeepLearningExamples repository☆10Updated 5 months ago
- A toolkit to calculate speech audio quality. Not affiliated with the original authors☆26Updated last month
- Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023☆27Updated last year
- [IJCAI'23] Learning to Speak from Text for Low-Resource TTS☆64Updated last year
- ☆62Updated 4 months ago
- Baseline Recipe for VoicePrivacy Challenge 2024: anonymization systems and evaluation software☆37Updated 3 months ago
- a curated list of speech datasets (110+ datasets, 75+ easy to download)☆76Updated last year
- Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection☆37Updated last month
- Convert English text from written expressions into spoken forms☆19Updated 2 years ago
- Unofficial implementation of miipher☆104Updated 5 months ago
- Deep Neural Pitch Extractor for Voice Conversion and TTS Training☆117Updated 2 years ago
- Zero-shot multimodal punctuation insertion and truecasing using Whisper☆95Updated last year
- Dataset of ICASSP 2021 MULTILINGUAL PHONETIC DATASET FOR LOW RESOURCE SPEECH RECOGNITION☆34Updated last year
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆115Updated 3 months ago
- VoicePAT is a modular and efficient toolkit for voice privacy research, with main focus on speaker anonymization.☆45Updated 4 months ago
- Clustering-based methods for overlapping diarization☆68Updated 8 months ago
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆70Updated last year
- ☆16Updated 3 years ago
- Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing☆67Updated last year
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…☆24Updated 3 months ago
- Speaker change detection using SincNet and an LSTM/Transformer☆39Updated 2 months ago
- ☆31Updated last year
- Online streaming speaker change detection model in Pytorch☆34Updated last year
- Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.☆89Updated 2 years ago
- ☆69Updated last year
- UTokyo-SaruLab MOS Prediction System☆49Updated this week
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis☆37Updated 3 years ago