DagsHub / audio-datasets
open-source audio datasets
☆149Updated last year
Alternatives and similar repositories for audio-datasets:
Users that are interested in audio-datasets are comparing it to the libraries listed below
- Dataset and baseline code for the VocalSound dataset (ICASSP2022).☆132Updated 2 years ago
- A speaker embedding network in Pytorch that is very quick to set up and use for whatever purposes.☆87Updated this week
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆119Updated 3 months ago
- Pytorch implementation of deep audio embedding calculation☆104Updated last year
- The official code repo for "Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data", in AAAI 2022☆199Updated 2 years ago
- The VoxTube dataset official repository☆68Updated last year
- Reproducible experimental protocols for multimedia (audio, video, text) database☆98Updated last month
- Scripts for computing the Intelligibility and CLVP scores for evaluating TTS models☆152Updated last year
- SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition☆76Updated 4 years ago
- A collection of useful audio datasets and transforms for PyTorch.☆138Updated 2 years ago
- HF's ML for Audio study group☆191Updated 2 years ago
- PyTorch implementation of the Perceptual Evaluation of Speech Quality for wideband audio☆179Updated last year
- Repository hosting code and slides of the Audio Data Augmentation series on The Sound of AI YT channel.☆37Updated 3 years ago
- An easy way to fine-tune Wav2Vec 2.0 for low-resource languages.☆81Updated last year
- SA-toolkit: Speaker speech anonymization toolkit in python☆23Updated last week
- Wav2Vec for speech recognition, classification, and audio classification☆261Updated 3 years ago
- Various speech datasets made available to the public☆115Updated 3 months ago
- Audio transformations library for PyTorch☆230Updated 2 years ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆81Updated last year
- This project is about performing Speaker diarization for Hindi Language.☆49Updated 4 years ago
- SERAB: a multi-lingual benchmark for speech emotion recognition☆28Updated 2 years ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆145Updated last month
- Confidence interval computation for evaluation in machine learning using the bootstrapping approach☆78Updated 11 months ago
- Estimating the Age, Height, and Gender of a speaker with their speech signal. https://arxiv.org/pdf/2110.13653.pdf☆65Updated 3 years ago
- Speakerbox: Fine-tune Audio Transformers for speaker identification.☆56Updated 4 months ago
- Repository containing experimentation platform on how to train, infer on wav2vec2 models.☆86Updated 2 years ago
- This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training …☆273Updated 4 months ago
- Masked Modeling Duo: Towards a Universal Audio Pre-training Framework☆89Updated 8 months ago
- A unified dataset of multilingual emotional human utterances☆25Updated 3 years ago
- Baseline multi-resolution cross network model trained using the Divide and Remaster Dataset☆80Updated last year