alumae / torch-xvectors-wav
☆22Updated 3 years ago
Alternatives and similar repositories for torch-xvectors-wav:
Users that are interested in torch-xvectors-wav are comparing it to the libraries listed below
- A handy dataset of noises for ASR☆20Updated 5 years ago
- ☆33Updated 3 years ago
- video cut powered by AI☆25Updated 2 years ago
- Conditional Variational Auto-Encoder with Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech☆22Updated 2 years ago
- NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment☆16Updated 2 years ago
- Prosodic Speech Segmentation with Transformers☆25Updated last year
- Open Source Speech/Text Data on AI☆18Updated 2 years ago
- Multipurpose Multi Speaker Mixture Signal Generator☆44Updated last month
- ☆56Updated 2 years ago
- Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks☆17Updated last year
- ☆17Updated last year
- ☆12Updated 3 years ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Updated 4 years ago
- Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation☆39Updated 4 years ago
- NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates [WIP]☆24Updated 2 years ago
- Pypi installable TDNN and TDNN-F layers for PyTorch based acoustic model training☆39Updated 4 years ago
- This is an extension of kaldi speech recognition software which allows to perform decoding of speech with hybrid word and phoneme graphs.…☆11Updated 5 years ago
- Small compression utility☆35Updated 8 months ago
- ☆20Updated 6 years ago
- An evaluation set for large-scale trained TTS models (Coming in Sep 2024)☆12Updated 6 months ago
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Updated last year
- ☆20Updated 5 years ago
- ☆16Updated 2 years ago
- 60k hours of phoneme-aligned audio from audio books☆18Updated 7 months ago
- ☆17Updated 3 years ago
- This repository contains all the code necessary for running the multilingual distilwhisper from Ferraz et al. 2024 IEEE ICASSP paper.☆20Updated last year
- Audio Generation model working with GPT-2 and VQVAE compressed representation of MelSpectrograms☆18Updated last year
- Pronunciation-assisted Subword Modeling☆29Updated 5 years ago
- Source Code for the Paper "UNIFIED KEYWORD SPOTTING AND AUDIO TAGGING ON MOBILE DEVICES WITH TRANSFORMERS"☆23Updated 2 years ago