oliverguhr / wav2vec2-live
A live speech recognition using Facebooks wav2vec 2.0 model.
☆341Updated last year
Alternatives and similar repositories for wav2vec2-live:
Users that are interested in wav2vec2-live are comparing it to the libraries listed below
- Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code☆145Updated 9 months ago
- Grapheme to phoneme conversion with deep learning.☆376Updated last year
- Segment an audio file and obtain utterance alignments. (Python package)☆328Updated 9 months ago
- PyTorch Implementation of FastSpeech 2 : Fast and High-Quality End-to-End Text to Speech☆229Updated 2 years ago
- A tokenizer, text cleaner, and phonemizer for many human languages.☆303Updated 3 months ago
- Performant and accurate speech recognition built on Pytorch☆252Updated 2 years ago
- ESPnet Model Zoo☆245Updated last year
- Large, modern dataset for speech recognition☆663Updated 11 months ago
- VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogram☆240Updated 6 months ago
- Variational Bayes HMM over x-vectors diarization☆263Updated last year
- Allosaurus is a pretrained universal phone recognizer for more than 2000 languages☆601Updated 9 months ago
- A fast and lightweight python-based CTC beam search decoder for speech recognition.☆436Updated last year
- A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration…☆324Updated 2 years ago
- On-device voice activity detection (VAD) powered by deep learning☆198Updated this week
- PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean,…☆295Updated 3 years ago
- A Generative Flow for Text-to-Speech via Monotonic Alignment Search☆681Updated 2 years ago
- A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation☆524Updated last year
- Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three differen…☆220Updated 2 years ago
- 🐸STT integration examples☆125Updated 2 years ago
- YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone☆939Updated 3 months ago
- Diarization scoring tools.☆235Updated last year
- UniSpeech - Large Scale Self-Supervised Learning for Speech☆449Updated 10 months ago
- Multilingual G2P in 100 languages☆299Updated last year
- ☆350Updated 11 months ago
- Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.☆286Updated last year
- Speaker embedding (d-vector) trained with GE2E loss☆276Updated last year
- A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)☆474Updated last year
- End-to-End Neural Diarization☆395Updated 3 years ago
- This is the GitHub page for publicly available emotional speech data.☆336Updated 3 years ago
- Wav2Vec for speech recognition, classification, and audio classification☆259Updated 2 years ago