WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.
☆92Jun 6, 2021Updated 4 years ago
Alternatives and similar repositories for wavencoder
Users that are interested in wavencoder are comparing it to the libraries listed below
Sorting:
- The History of Speech Recognition to the Year 2030☆13Aug 14, 2021Updated 4 years ago
- Estimating the Age, Height, and Gender of a speaker with their speech signal. https://arxiv.org/pdf/2110.13653.pdf☆68May 12, 2021Updated 4 years ago
- Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)☆104Nov 26, 2022Updated 3 years ago
- Efficient Speech Processing Tookit for Automatic Speaker Recognition☆17Feb 8, 2023Updated 3 years ago
- The Additive Margin SincNet (AM-SincNet) is a new approach for speaker recognition problems which is based in the neural network architec…☆46Oct 3, 2023Updated 2 years ago
- Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-gramma…☆21Jan 24, 2022Updated 4 years ago
- Wav2Vec for speech recognition, classification, and audio classification☆273Apr 2, 2022Updated 3 years ago
- ☆61Jan 31, 2023Updated 3 years ago
- ☆11Nov 5, 2021Updated 4 years ago
- Deep Discriminative Embeddings for Duration Robust Speaker Verification☆19Dec 16, 2019Updated 6 years ago
- A library for speech data augmentation in time-domain☆684Aug 30, 2021Updated 4 years ago
- [ICASSP'23] Online speaker clustering☆17Feb 22, 2026Updated last month
- ☆229Nov 13, 2023Updated 2 years ago
- A repository comprising of code for generation of noisy speech data from clean data using deep learning methods☆16Jul 12, 2021Updated 4 years ago
- ☆26Jun 5, 2024Updated last year
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- Speaker embedding (d-vector) trained with GE2E loss☆286Jan 8, 2024Updated 2 years ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- Python package for combining diarization system outputs.☆92Oct 12, 2023Updated 2 years ago
- Web page for ISCA Special Interest Group: Robust Speech Processing (RoSP)☆11Dec 4, 2023Updated 2 years ago
- The Additive Margin MobileNet1D is a new light weight deep learning model for Speaker Recognition which is based on the MobileNetV2 archi…☆30Oct 3, 2023Updated 2 years ago
- Code for the Paper Speech Recognition and Multi-Speaker Diarization of Long Conversations☆38Jun 12, 2023Updated 2 years ago
- [InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei …☆209Dec 8, 2022Updated 3 years ago
- Python wrapper for kaldi's arpa2fst☆38Aug 27, 2025Updated 6 months ago
- A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.☆106Mar 25, 2023Updated 2 years ago
- Memory efficient transducer loss computation☆70Jun 10, 2022Updated 3 years ago
- Simplified recipes for preparing commonly used speech datasets, and a PyTorch-compatible Python data loader that can perform standard fea…☆15Jun 12, 2023Updated 2 years ago
- ☆28Oct 7, 2025Updated 5 months ago
- ☆67Aug 16, 2023Updated 2 years ago
- The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at I…☆18Feb 17, 2023Updated 3 years ago
- VCTK multi-speaker tacotron for ICASSP 2020☆266Mar 29, 2022Updated 3 years ago
- Pre-training Cross-modal Transformer for Audio-and-Language Representations☆38Apr 20, 2021Updated 4 years ago
- A toolkit for non-parallel voice conversion based on vector-quantized variational autoencoder☆171Jul 25, 2024Updated last year
- [ASRU 2021] Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition☆219Jun 22, 2023Updated 2 years ago
- ☆76Oct 25, 2021Updated 4 years ago
- ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for…☆44Dec 17, 2020Updated 5 years ago
- Torch-based tool for quantizing high-dimensional vectors using additive codebooks☆54May 25, 2022Updated 3 years ago
- Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.☆50May 19, 2021Updated 4 years ago
- Self-Supervised Speech Pre-training and Representation Learning Toolkit☆2,536Mar 12, 2026Updated last week