cdjkim/audiocaps

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cdjkim/audiocaps)

cdjkim / audiocaps

🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps

☆215

Alternatives and similar repositories for audiocaps

Users that are interested in audiocaps are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

XinhaoMei / WavCaps
View on GitHub
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
☆264Jul 25, 2024Updated last year
XinhaoMei / ACT
View on GitHub
Source code for the paper 'Audio Captioning Transformer'
☆56Jan 18, 2022Updated 4 years ago
JishengBai / AudioSetCaps
View on GitHub
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
☆208Dec 13, 2024Updated last year
audio-captioning / clotho-dataset
View on GitHub
Python code for handling the Clotho dataset.
☆85Nov 24, 2020Updated 5 years ago
wsntxxn / AudioCaption
View on GitHub
Audio captioning recipe
☆53Oct 23, 2025Updated 9 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
MorenoLaQuatra / audiocaps-download
View on GitHub
This package aims at simplifying the download of the AudioCaps dataset.
☆35Dec 1, 2023Updated 2 years ago
Sreyan88 / RECAP
View on GitHub
Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning
☆16Jun 23, 2024Updated 2 years ago
microsoft / CLAP
View on GitHub
Learning audio concepts from natural language supervision
☆672Sep 18, 2024Updated last year
LAION-AI / audio-dataset
View on GitHub
Audio Dataset for training CLAP and other models
☆748Jan 8, 2026Updated 6 months ago
audio-captioning / audio-captioning-resources
View on GitHub
A list of resources that can help in research for automated audio captioning
☆34Feb 17, 2021Updated 5 years ago
LAION-AI / CLAP
View on GitHub
Contrastive Language-Audio Pretraining
☆2,228May 15, 2025Updated last year
Labbeti / aac-datasets
View on GitHub
Audio Captioning datasets for PyTorch.
☆129Mar 25, 2026Updated 3 months ago
audio-captioning / audio-captioning-papers
View on GitHub
A list of papers about audio captioning
☆78Jul 1, 2022Updated 4 years ago
akoepke / audio-retrieval-benchmark
View on GitHub
Code for "Audio Retrieval with Natural Language Queries: A Benchmark Study", Transactions on Multimedia 2022
☆54Jul 16, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Sreyan88 / CompA
View on GitHub
Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
☆23Jul 10, 2024Updated 2 years ago
yangdongchao / UniAudio
View on GitHub
The Open Source Code of UniAudio
☆605Jul 22, 2024Updated 2 years ago
microsoft / Pengi
View on GitHub
An Audio Language model for Audio Tasks
☆322Apr 19, 2024Updated 2 years ago
hche11 / VGGSound
View on GitHub
VGGSound: A Large-scale Audio-Visual Dataset
☆359Sep 13, 2021Updated 4 years ago
YuanGongND / vocalsound
View on GitHub
Dataset and baseline code for the VocalSound dataset (ICASSP2022).
☆165Nov 12, 2022Updated 3 years ago
XinhaoMei / DCASE2021_task6_v2
View on GitHub
Code for CVSSP submission to DCASE 2021 Task 6
☆36Nov 22, 2022Updated 3 years ago
AmphionTeam / SD-Eval
View on GitHub
[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
☆57Jun 25, 2024Updated 2 years ago
WangHelin1997 / Aty-TTS
View on GitHub
Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech
☆11May 14, 2025Updated last year
sony / CLIPSep
View on GitHub
☆43Feb 21, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Audio-AGI / FlowSep
View on GitHub
Official implementation for FlowSep
☆77Jan 2, 2025Updated last year
haoheliu / AudioLDM-training-finetuning
View on GitHub
AudioLDM training, finetuning, evaluation and inference.
☆304Dec 13, 2024Updated last year
NAVER-INTEL-Co-Lab / gaudi-lavcap
View on GitHub
☆15Jan 24, 2025Updated last year
YuanGongND / ltu
View on GitHub
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
☆478Apr 24, 2024Updated 2 years ago
ajd12342 / paraspeechcaps
View on GitHub
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
☆163Mar 26, 2026Updated 3 months ago
facebookresearch / AudioMAE
View on GitHub
This repo hosts the code and models of "Masked Autoencoders that Listen".
☆673Apr 5, 2024Updated 2 years ago
AndreyGuzhov / AudioCLIP
View on GitHub
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
☆872Sep 30, 2021Updated 4 years ago
bytedance / SALMONN
View on GitHub
SALMONN family: A suite of advanced multi-modal LLMs
☆1,479Jul 16, 2026Updated last week
sony / soundctm
View on GitHub
Pytorch implementation of SoundCTM
☆101Mar 31, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Stability-AI / stable-audio-metrics
View on GitHub
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
☆300Updated this week
OFA-Sys / AIR-Bench
View on GitHub
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
☆133Dec 9, 2024Updated last year
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 4 months ago
haoheliu / audioldm_eval
View on GitHub
This toolbox aims to unify audio generation model evaluation for easier comparison.
☆390Sep 29, 2024Updated last year
slSeanWU / beats-conformer-bart-audio-captioner
View on GitHub
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…
☆41Jan 6, 2024Updated 2 years ago
blmoistawinde / fense
View on GitHub
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval…
☆21Feb 1, 2023Updated 3 years ago
haoheliu / AudioLDM
View on GitHub
AudioLDM: Generate speech, sound effects, music and beyond, with text.
☆2,908Jun 25, 2025Updated last year