XinhaoMei / WavCapsLinks
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
β237Updated 11 months ago
Alternatives and similar repositories for WavCaps
Users that are interested in WavCaps are comparing it to the libraries listed below
Sorting:
- Audio Captioning datasets for PyTorch.β121Updated this week
- π Repository for our NAACL-HLT 2019 paper: AudioCapsβ176Updated 4 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehensionβ106Updated 7 months ago
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformerβ166Updated 2 weeks ago
- An Audio Language model for Audio Tasksβ309Updated last year
- This package aims at simplifying the download of the AudioCaps dataset.β35Updated last year
- Scripts for download AudioSetβ78Updated 7 years ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipelineβ163Updated 7 months ago
- β169Updated last year
- Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'β46Updated 3 years ago
- Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mixβ143Updated last month
- Source code for the paper 'Audio Captioning Transformer'β54Updated 3 years ago
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Modelβ226Updated 2 months ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)β54Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilitiesβ134Updated 7 months ago
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)β70Updated 5 months ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.β158Updated 2 months ago
- AudioBench: A Universal Benchmark for Audio Large Language Modelsβ231Updated 3 weeks ago
- AudioLDM training, finetuning, evaluation and inference.β261Updated 7 months ago
- β41Updated 2 years ago
- The dataset and baseline code for Text-to-Audio Grounding (TAG)β42Updated last week
- Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling anβ¦β34Updated 2 years ago
- β86Updated last week
- This repo hosts the code and model of "Separate What You Describe: Language-Queried Audio Source Separation", Interspeech 2022β145Updated last year
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenizationβ179Updated last year
- Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".β441Updated last year
- Audio-FLANβ157Updated 4 months ago
- Official Implementation of EnCLAP (ICASSP 2024)β92Updated last year
- Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".β264Updated last year
- π¦ Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)β57Updated 5 months ago