WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation, and training audio classification models with PyTorch backend.
☆92Jun 6, 2021Updated 5 years ago
Alternatives and similar repositories for wavencoder
Users that are interested in wavencoder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The History of Speech Recognition to the Year 2030☆13Aug 14, 2021Updated 4 years ago
- Estimating the Age, Height, and Gender of a speaker with their speech signal. https://arxiv.org/pdf/2110.13653.pdf☆68May 12, 2021Updated 5 years ago
- Code for DeCoAR (ICASSP 2020) and BERTphone (Odyssey 2020)☆104Nov 26, 2022Updated 3 years ago
- Efficient Speech Processing Tookit for Automatic Speaker Recognition☆17Feb 8, 2023Updated 3 years ago
- The Additive Margin SincNet (AM-SincNet) is a new approach for speaker recognition problems which is based in the neural network architec…☆46Oct 3, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-gramma…☆21Jan 24, 2022Updated 4 years ago
- Wav2Vec for speech recognition, classification, and audio classification☆275Apr 2, 2022Updated 4 years ago
- ☆61Jan 31, 2023Updated 3 years ago
- ☆11Nov 5, 2021Updated 4 years ago
- Deep Discriminative Embeddings for Duration Robust Speaker Verification☆19Dec 16, 2019Updated 6 years ago
- A library for speech data augmentation in time-domain☆688Aug 30, 2021Updated 4 years ago
- ☆231Nov 13, 2023Updated 2 years ago
- [ICASSP'23] Online speaker clustering☆18Feb 22, 2026Updated 3 months ago
- A repository comprising of code for generation of noisy speech data from clean data using deep learning methods☆16Jul 12, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆26Jun 5, 2024Updated 2 years ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- Speaker embedding (d-vector) trained with GE2E loss☆289Jan 8, 2024Updated 2 years ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- Python package for combining diarization system outputs.☆94Oct 12, 2023Updated 2 years ago
- Web page for ISCA Special Interest Group: Robust Speech Processing (RoSP)☆11Dec 4, 2023Updated 2 years ago
- The Additive Margin MobileNet1D is a new light weight deep learning model for Speaker Recognition which is based on the MobileNetV2 archi…☆31Oct 3, 2023Updated 2 years ago
- Code for the Paper Speech Recognition and Multi-Speaker Diarization of Long Conversations☆38Jun 12, 2023Updated 3 years ago
- [InterSpeech 2020] "AutoSpeech: Neural Architecture Search for Speaker Recognition" by Shaojin Ding*, Tianlong Chen*, Xinyu Gong, Weiwei …☆207Dec 8, 2022Updated 3 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Python wrapper for kaldi's arpa2fst☆38Aug 27, 2025Updated 9 months ago
- Memory efficient transducer loss computation☆70Jun 10, 2022Updated 4 years ago
- Simplified recipes for preparing commonly used speech datasets, and a PyTorch-compatible Python data loader that can perform standard fea…☆16Jun 2, 2026Updated last week
- ☆28Apr 24, 2026Updated last month
- ☆68Aug 16, 2023Updated 2 years ago
- Pre-training Cross-modal Transformer for Audio-and-Language Representations☆39Apr 20, 2021Updated 5 years ago
- VCTK multi-speaker tacotron for ICASSP 2020☆266Mar 29, 2022Updated 4 years ago
- [ASRU 2021] Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition☆220Jun 22, 2023Updated 2 years ago
- The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at I…☆18Feb 17, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A toolkit for non-parallel voice conversion based on vector-quantized variational autoencoder☆171Jul 25, 2024Updated last year
- ☆77Oct 25, 2021Updated 4 years ago
- ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for…☆44Dec 17, 2020Updated 5 years ago
- Torch-based tool for quantizing high-dimensional vectors using additive codebooks☆54May 25, 2022Updated 4 years ago
- Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.☆50May 19, 2021Updated 5 years ago
- Self-Supervised Speech Pre-training and Representation Learning Toolkit☆2,553Mar 12, 2026Updated 3 months ago
- Raw waveform adaptation with SincNet☆12Mar 19, 2024Updated 2 years ago