Labbeti/aac-metrics

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Labbeti/aac-metrics)

Labbeti / aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

☆75

Alternatives and similar repositories for aac-metrics

Users that are interested in aac-metrics are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

felixgontier / dcase-2023-baseline
View on GitHub
☆14Mar 25, 2023Updated 3 years ago
microsoft / AudioEntailment
View on GitHub
Audio Entailment: Deductive Reasoning for Audio Understanding
☆17Dec 10, 2024Updated last year
audio-captioning / caption-evaluation-tools
View on GitHub
Tools for the evaluation of audio captioning.
☆19May 23, 2020Updated 6 years ago
Labbeti / conette-audio-captioning
View on GitHub
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
☆23Dec 17, 2025Updated 7 months ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
qiuqiangkong / mini_llm
View on GitHub
☆29Jul 4, 2025Updated last year
Sreyan88 / CompA
View on GitHub
Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
☆23Jul 10, 2024Updated 2 years ago
v-manhlt3 / m-LTM-Audio-Text-Retrieval
View on GitHub
☆13Jan 5, 2025Updated last year
XinhaoMei / WavCaps
View on GitHub
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
☆264Jul 25, 2024Updated last year
Labbeti / aac-datasets
View on GitHub
Audio Captioning datasets for PyTorch.
☆129Mar 25, 2026Updated 3 months ago
AMAAI-Lab / JamendoMaxCaps
View on GitHub
JamendoMaxCaps is a large-scale dataset of 362,000 instrumental creative commons tracks
☆53May 24, 2025Updated last year
qiuqiangkong / music_llm
View on GitHub
☆56Jul 13, 2025Updated last year
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 4 months ago
DCASE2024-Task7-Sound-Scene-Synthesis / AudioLDM-training-finetuning
View on GitHub
AudioLDM training, finetuning, evaluation and inference.
☆14Mar 27, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
XinhaoMei / DCASE2021_task6_v2
View on GitHub
Code for CVSSP submission to DCASE 2021 Task 6
☆36Nov 22, 2022Updated 3 years ago
wsntxxn / AudioCaption
View on GitHub
Audio captioning recipe
☆53Oct 23, 2025Updated 8 months ago
hongfeixue / StutteringSpeechChallenge
View on GitHub
SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
☆12Jun 11, 2024Updated 2 years ago
YuanGongND / ltu
View on GitHub
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
☆478Apr 24, 2024Updated 2 years ago
zszheng147 / Spatial-AST
View on GitHub
🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)
☆87Feb 13, 2025Updated last year
qiuqiangkong / audioflow
View on GitHub
☆128Updated this week
liuxubo717 / cl4ac
View on GitHub
Code for "CL4AC: A Contrastive Loss for Audio Captioning", DCASE Workshop 2021.
☆45Oct 8, 2021Updated 4 years ago
kyegomez / AudioFlamingo
View on GitHub
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…
☆39Jan 27, 2025Updated last year
Audio-AGI / dcase2024_task9_baseline
View on GitHub
Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"
☆26Mar 27, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
slSeanWU / beats-conformer-bart-audio-captioner
View on GitHub
PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…
☆41Jan 6, 2024Updated 2 years ago
h-munakata / Lighthouse-Wrapper-for-Audio-Moment-Retrieval
View on GitHub
☆13Mar 23, 2026Updated 3 months ago
kuan2jiu99 / audio-hallucination
View on GitHub
Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024
☆34Mar 14, 2025Updated last year
whojavumusic / HARP
View on GitHub
HARP: A Large-Scale Higher-Order Ambisonic Room Impulse Response Dataset
☆35Jun 3, 2025Updated last year
JishengBai / AudioSetCaps
View on GitHub
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
☆208Dec 13, 2024Updated last year
nttcslab / msm-mae
View on GitHub
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations
☆99Feb 20, 2026Updated 5 months ago
danpovey / conditional-flow-matching
View on GitHub
☆29Aug 8, 2024Updated last year
zeyuxie29 / SemanticVocoder
View on GitHub
☆28Apr 6, 2026Updated 3 months ago
qiuqiangkong / audio_understanding
View on GitHub
☆131Feb 6, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Sreyan88 / ReCLAP
View on GitHub
☆33Dec 23, 2025Updated 6 months ago
blmoistawinde / fense
View on GitHub
Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval…
☆21Feb 1, 2023Updated 3 years ago
hkchengrex / av-benchmark
View on GitHub
Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs…
☆79Feb 14, 2026Updated 5 months ago
microsoft / CLAP
View on GitHub
Learning audio concepts from natural language supervision
☆672Sep 18, 2024Updated last year
alanshaoTT / LAT-Audio-Repo
View on GitHub
☆23Apr 28, 2026Updated 2 months ago
snap-research / GenAU
View on GitHub
☆53Mar 24, 2026Updated 3 months ago
ETH-DISCO / sao-instruct
View on GitHub
Official repo for SAO-Instruct: Free-form Audio Editing using Natural Language Instructions presented at NeurIPS 2025
☆17Oct 28, 2025Updated 8 months ago