XinhaoMei / WavCapsLinks
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
ā253Updated last year
Alternatives and similar repositories for WavCaps
Users that are interested in WavCaps are comparing it to the libraries listed below
Sorting:
- š Repository for our NAACL-HLT 2019 paper: AudioCapsā202Updated 3 months ago
- An Audio Language model for Audio Tasksā318Updated last year
- Audio Captioning datasets for PyTorch.ā125Updated 5 months ago
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformerā213Updated last month
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipelineā193Updated last year
- This package aims at simplifying the download of the AudioCaps dataset.ā36Updated 2 years ago
- ā176Updated last year
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehensionā125Updated last year
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.ā177Updated 9 months ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)ā58Updated last year
- The dataset and baseline code for Text-to-Audio Grounding (TAG)ā49Updated 2 months ago
- Scripts for download AudioSetā86Updated 8 years ago
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Modelā286Updated 3 months ago
- AudioLDM training, finetuning, evaluation and inference.ā290Updated last year
- ā43Updated 2 years ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilitiesā149Updated last year
- Audio captioning recipeā51Updated 2 months ago
- This repo hosts the code and model of "Separate What You Describe: Language-Queried Audio Source Separation", Interspeech 2022ā145Updated 2 years ago
- Audio-FLANā160Updated 3 months ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenizationā190Updated last year
- Web-crawl for "Audio Retrieval with WavText5K and CLAP Training"ā50Updated 3 years ago
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mixā192Updated last month
- ā130Updated 4 months ago
- Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'ā50Updated 3 years ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"ā79Updated 2 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sā¦ā151Updated 7 months ago
- [AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Modelsā27Updated 2 years ago
- Source code for the paper 'Audio Captioning Transformer'ā57Updated 3 years ago
- š¦ Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)ā70Updated 11 months ago
- Download audioset data super fastly with youtube-dl, ffmpeg and python multiprocessingā44Updated last year