jonatasgrosman / wav2vec2-sprint
☆179Updated 2 years ago
Alternatives and similar repositories for wav2vec2-sprint:
Users that are interested in wav2vec2-sprint are comparing it to the libraries listed below
- HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools☆440Updated last year
- [NeurIPS'22] Squeezeformer: An Efficient Transformer for Automatic Speech Recognition☆246Updated last year
- An easy way to fine-tune Wav2Vec 2.0 for low-resource languages.☆81Updated last year
- phoneme tokenizer and grapheme-to-phoneme model for 8k languages☆151Updated last year
- Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code☆144Updated 8 months ago
- Variational Bayes HMM over x-vectors diarization☆260Updated last year
- Wav2Vec for speech recognition, classification, and audio classification☆253Updated 2 years ago
- Segment an audio file and obtain utterance alignments. (Python package)☆325Updated 8 months ago
- Few-shot Keyword Spotting in Any Language and Multilingual Spoken Word Corpus☆170Updated last month
- PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean,…☆293Updated 3 years ago
- A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration…☆324Updated 2 years ago
- Multilingual G2P in 100 languages☆295Updated last year
- A tokenizer, text cleaner, and phonemizer for many human languages.☆295Updated 2 months ago
- This is the GitHub page for publicly available emotional speech data.☆330Updated 3 years ago
- Large, modern dataset for speech recognition☆656Updated 10 months ago
- NeMo text processing for ASR and TTS☆297Updated last week
- A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation☆520Updated last year
- PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech☆228Updated 2 years ago
- Wav2Keyword is keyword spotting(KWS) based on Wav2Vec 2.0. This model shows state-of-the-art in Speech commands dataset V1 and V2.☆102Updated 2 years ago
- Kaldi-compatible online & offline feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd - P…☆193Updated last month
- Speaker embedding (d-vector) trained with GE2E loss☆273Updated last year
- Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three differen…☆216Updated 2 years ago
- Estimating the Age, Height, and Gender of a speaker with their speech signal. https://arxiv.org/pdf/2110.13653.pdf☆64Updated 3 years ago
- ☆38Updated 3 years ago
- Speaker identification/verification models for Machine Learning for Computer Vision class at UNIBO☆60Updated 2 years ago
- Various speech datasets made available to the public☆107Updated last month
- AdaSpeech: Adaptive Text to Speech for Custom Voice☆156Updated 3 years ago
- CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus☆193Updated 2 years ago
- A lightweight library to compute Diarization Error Rate (DER).☆59Updated last year
- PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.☆158Updated 10 months ago