espnet / notebook
☆63Updated last month
Related projects ⓘ
Alternatives and complementary repositories for notebook
- Speaker change detection using SincNet and an LSTM/Transformer☆44Updated 4 months ago
- A mini, simple, and fast end-to-end automatic speech recognition toolkit.☆47Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆71Updated last year
- ☆50Updated last year
- Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning (ASRU2023)☆26Updated last year
- Dataset and baseline code for the VocalSound dataset (ICASSP2022).☆123Updated 2 years ago
- Clustering-based methods for overlapping diarization☆70Updated 10 months ago
- Speaker identification/verification models for Machine Learning for Computer Vision class at UNIBO☆58Updated 2 years ago
- A speaker embedding network in Pytorch that is very quick to set up and use for whatever purposes.☆86Updated last year
- Zero-shot multimodal punctuation insertion and truecasing using Whisper☆99Updated last year
- An easy way to fine-tune Wav2Vec 2.0 for low-resource languages.☆81Updated last year
- A sequence-to-sequence voice conversion toolkit.☆86Updated 4 months ago
- Code and data repository for paper "VoxCeleb enrichment for Age and Gender recognition" submitted at ASRU 2021☆64Updated 2 years ago
- ☆160Updated 2 years ago
- This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsuperv…☆126Updated 3 weeks ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆83Updated last month
- A pakage for crawling audio from Youtube☆41Updated last year
- Phoneme segmentation using pre-trained speech models☆54Updated 2 years ago
- NOTSOFAR-1 Challenge: Distant Diarization and ASR☆44Updated last week
- Reproducible experimental protocols for multimedia (audio, video, text) database☆84Updated last month
- Example code for a neural transducer model.☆60Updated 9 months ago
- Estimating the Age, Height, and Gender of a speaker with their speech signal. https://arxiv.org/pdf/2110.13653.pdf☆64Updated 3 years ago
- Transcribing Speech with Multinomial Diffusion, training code and models.☆75Updated last year
- ☆37Updated 3 years ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆72Updated 5 months ago
- The VoxTube dataset official repository☆61Updated 9 months ago
- PyTorch implementation of RNN-Transducer(RNN-T).☆72Updated 3 years ago
- A python library for voice activity detection (VAD) for speech/non-speech segmentation.☆83Updated 2 years ago
- Various speech datasets made available to the public☆99Updated last month