zhaoyanpeng / audioset-dl
Download AudioSet for Vision-Audio-Text Pre-training
☆12Updated 2 years ago
Related projects: ⓘ
- VIsually-Pivoted Audio and(N) Text☆20Updated 2 years ago
- A list of resources that can help in research for automated audio captioning☆34Updated 3 years ago
- The Pytorch implementation of paper: Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training☆39Updated last year
- ☆32Updated 3 years ago
- ☆30Updated 2 years ago
- COLA contrastive pre-training method implemented in PyTorch☆42Updated 3 years ago
- ☆47Updated last year
- Download and create a tfreader for the audioset dataset☆17Updated 4 years ago
- Language modelling for sound event detection☆21Updated 4 years ago
- Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model☆26Updated last year
- Submission to MediaEval 2021 Emotions and Themes in Music challenge. Noisy-student training for music emotion tagging☆11Updated 2 years ago
- Emotion detection in audio utilising self-supervised representations trained with Contrastive Predictive Coding (CPC).☆41Updated 2 years ago
- ☆57Updated 3 years ago
- System that ranks 2nd in DCASE 2022 Challenge Task 5: Few-shot Bioacoustic Event Detection☆27Updated 2 years ago
- Audio captioning baseline system for DCASE 2020 challenge.☆37Updated last year
- Learning differentiable temporal resolution on time-series data.☆33Updated last year
- Python code for handling the Clotho dataset.☆74Updated 3 years ago
- Pre-training Cross-modal Transformer for Audio-and-Language Representations☆39Updated 3 years ago
- ☆35Updated 2 years ago
- Simple baseline model for the HEAR benchmark☆22Updated last month
- A list of papers about audio captioning☆77Updated 2 years ago
- Implementation of "Audio Retrieval with Natural Language Queries", INTERSPEECH 2021, PyTorch☆27Updated last year
- Dataset and baseline for the first Audiocaption task☆78Updated last month
- Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clusterin…☆41Updated last year
- Official implementation of the paper How to Listen? Rethinking Visual Sound Localization☆16Updated 2 years ago
- Audio Generation model working with GPT-2 and VQVAE compressed representation of MelSpectrograms☆18Updated 11 months ago
- Zero-shot Learning for Audio-based Music Classification and Tagging (ISMIR 2019)☆40Updated 4 years ago
- follow NVIDIA, simplify it and support data parallel.☆13Updated 4 years ago
- Code for the paper "Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks".☆13Updated last year
- ☆52Updated 3 years ago