MorenoLaQuatra / audioset-downloadView external linksLinks
This package aims at simplifying the download of the AudioSet dataset.
☆56Jul 17, 2025Updated 7 months ago
Alternatives and similar repositories for audioset-download
Users that are interested in audioset-download are comparing it to the libraries listed below
Sorting:
- Download audioset data super fastly with youtube-dl, ffmpeg and python multiprocessing☆45Aug 1, 2024Updated last year
- Toolkit for downloading and processing Google's AudioSet dataset.☆175Aug 22, 2025Updated 5 months ago
- This package aims at simplifying the download of the AudioCaps dataset.☆36Dec 1, 2023Updated 2 years ago
- A spoken version of the textual story cloze benchmark☆20Aug 6, 2023Updated 2 years ago
- WildDESED: A LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection☆17Nov 19, 2024Updated last year
- FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation☆28Dec 19, 2024Updated last year
- [ICASSP'24] Investigating Personalization Methods in Text to Music Generation☆45Mar 27, 2024Updated last year
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago
- AudioLDM training, finetuning, evaluation and inference.☆296Dec 13, 2024Updated last year
- ☆14Jun 16, 2023Updated 2 years ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated 11 months ago
- Code to train a custom time-domain autoencoder to dereverb audio☆16Nov 30, 2023Updated 2 years ago
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation☆59Jul 2, 2025Updated 7 months ago
- provide SPHERE-formatted output as well as RIFF, AU, AIFF and raw☆14Dec 18, 2021Updated 4 years ago
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆16Jun 23, 2024Updated last year
- Generate accompaniment part with chords using Evolutionary algorithm.☆11May 8, 2022Updated 3 years ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆195Dec 13, 2024Updated last year
- Accompanying code for paper "Attention-Based Contextual Language Model Adaptation for Speech Recognition", submitted to ACL 2021.☆14Jul 25, 2023Updated 2 years ago
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Jun 11, 2024Updated last year
- Official code of "N-Gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding"☆14Apr 10, 2024Updated last year
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆221Nov 30, 2025Updated 2 months ago
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.☆32Jan 26, 2024Updated 2 years ago
- Unofficial implementation of FSD50k baselines for Sound Event Recognition☆26Apr 27, 2024Updated last year
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆73Feb 13, 2025Updated last year
- ☆23Feb 2, 2022Updated 4 years ago
- ☆117Updated this week
- Simple voice activity detection (VAD) algorithm in Python☆15Aug 10, 2023Updated 2 years ago
- Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation (ACM MM 2024)☆20Mar 17, 2025Updated 11 months ago
- ☆37Jul 4, 2024Updated last year
- Pre-training BART model for the Italian Language☆16Dec 28, 2022Updated 3 years ago
- Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models☆22Jul 10, 2024Updated last year
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆38Jan 6, 2024Updated 2 years ago
- The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…☆63Dec 26, 2025Updated last month
- ☆32Dec 24, 2025Updated last month
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion☆33Sep 9, 2025Updated 5 months ago
- ☆18May 4, 2025Updated 9 months ago
- Transformer-based visually grounded speech models☆19Sep 22, 2022Updated 3 years ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆212Sep 19, 2024Updated last year
- Official implementation for the paper: A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Unit…☆83Jan 7, 2023Updated 3 years ago