audio-captioning/caption-evaluation-tools

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/audio-captioning/caption-evaluation-tools)

audio-captioning / caption-evaluation-tools

Tools for the evaluation of audio captioning.

☆19

Alternatives and similar repositories for caption-evaluation-tools

Users that are interested in caption-evaluation-tools are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

XinhaoMei / DCASE2021_task6_v2
View on GitHub
Code for CVSSP submission to DCASE 2021 Task 6
☆36Nov 22, 2022Updated 3 years ago
felixgontier / dcase-2023-baseline
View on GitHub
☆14Mar 25, 2023Updated 3 years ago
Labbeti / aac-metrics
View on GitHub
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
☆75Mar 22, 2026Updated 3 months ago
xieh97 / dcase2023-audio-retrieval
View on GitHub
Baseline system for Language-based Audio Retrieval (Task 6B) in DCASE 2023 Challenge
☆10Aug 8, 2023Updated 2 years ago
audio-captioning / clotho-dataset
View on GitHub
Python code for handling the Clotho dataset.
☆85Nov 24, 2020Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
audio-captioning / audio-captioning-resources
View on GitHub
A list of resources that can help in research for automated audio captioning
☆34Feb 17, 2021Updated 5 years ago
RicherMans / AudioCaption
View on GitHub
Dataset and baseline for the first Audiocaption task
☆79Jul 25, 2024Updated last year
snap-research / GenAU
View on GitHub
☆53Mar 24, 2026Updated 3 months ago
qiuqiangkong / materials_for_students
View on GitHub
☆16Aug 10, 2025Updated 11 months ago
ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆35Feb 11, 2026Updated 5 months ago
soham97 / ADIFF
View on GitHub
Explaining audio differences using language
☆16Feb 11, 2025Updated last year
magronp / phase-madtwinnet
View on GitHub
Code for phase recovery in MadTwinNet for monaural singing voice separation
☆12Jul 17, 2018Updated 8 years ago
wsntxxn / TextToAudioGrounding
View on GitHub
The dataset and baseline code for Text-to-Audio Grounding (TAG)
☆49Oct 23, 2025Updated 8 months ago
EmilianPostolache / stable-audio-controlnet
View on GitHub
Fine-tune Stable Audio Open with DiT ControlNet.
☆256May 16, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Ceaglex / LoVA
View on GitHub
The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arc…
☆16Feb 27, 2025Updated last year
uthree / ddsp-vocoder
View on GitHub
☆12Nov 7, 2024Updated last year
gzhu06 / Cacophony
View on GitHub
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
☆49Jan 19, 2026Updated 6 months ago
v0lta / Spectral-RNN
View on GitHub
Spectral RNNs with adaptive window learning in TensorFlow, ICANN 2020.
☆10Sep 20, 2021Updated 4 years ago
shengcanxu / canoSpeech
View on GitHub
text to speech
☆10Mar 19, 2024Updated 2 years ago
andrebola / contrastive-mir-learning
View on GitHub
This repo contains the code to reproduce the paper: "Enriched Music Representations with Multiple Cross-modal Contrastive Learning"
☆15Jun 22, 2023Updated 3 years ago
audio-captioning / dcase-2020-baseline
View on GitHub
Audio captioning baseline system for DCASE 2020 challenge.
☆38Aug 22, 2023Updated 2 years ago
archinetai / aligner-pytorch
View on GitHub
Sequence alignement methods with helpers for PyTorch.
☆24Nov 30, 2022Updated 3 years ago
junwoopark92 / PUBG-Gun-Sound-Dataset
View on GitHub
"Enemy Spotted: In-game Gun Sound Dataset for Gunshot Classification and Localization", accepted at IEEE Conference on Games (GoG) 2022
☆24Sep 6, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
dr-costas / undaw
View on GitHub
Unsupervised Domain Adaptation for Acoustic Scene Classification with Wasserstein Distance
☆14Sep 16, 2020Updated 5 years ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
juhannam / gct634-2024
View on GitHub
Code repository for GCT634 Musical Applications of Machine Learning (Spring 2024)
☆11May 19, 2024Updated 2 years ago
luosiallen / Diff-Foley
View on GitHub
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
☆205May 29, 2024Updated 2 years ago
MorenoLaQuatra / audiocaps-download
View on GitHub
This package aims at simplifying the download of the AudioCaps dataset.
☆35Dec 1, 2023Updated 2 years ago
Many0therFunctions / MaskGCT-Text-To-Semantic-Finetune
View on GitHub
This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …
☆13Dec 4, 2024Updated last year
d2l-ai / d2l-zh-tensorflow-colab
View on GitHub
Automatically Generated d2l-zh TensorFlow Notebooks for Colab
☆12Aug 18, 2023Updated 2 years ago
pariajm / e2e-asr-and-disfluency-removal-evaluator
View on GitHub
A new metric for evaluating end-to-end speech recognition and disfluency removal systems
☆19Mar 7, 2021Updated 5 years ago
samsad35 / code-ancogen
View on GitHub
[ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
☆14Mar 11, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ZehuaKcrissLi / GTR-Voice
View on GitHub
☆16Nov 11, 2024Updated last year
raymondxu / java-workshop
View on GitHub
Intermediate Java workshop on variables, abstraction, and design patterns ☕
☆10Sep 7, 2017Updated 8 years ago
YuejieGao / TG-CRITIC
View on GitHub
TG-CRITIC: A TIMBRE-GUIDED MODEL FOR REFERENCE-INDEPENDENT SINGING EVALUATION
☆18May 26, 2023Updated 3 years ago
AV-Reasoner / AV-Reasoner
View on GitHub
☆19Jul 22, 2025Updated 11 months ago
danpovey / conditional-flow-matching
View on GitHub
☆29Aug 8, 2024Updated last year
merlresearch / sebbs
View on GitHub
Prediction of sound event bounding boxes (SEBBs)
☆35Aug 2, 2024Updated last year
michaelneri / unsupervised-audio-anomaly-detection
View on GitHub
Official repository of the work "Low-complexity Unsupervised Audio Anomaly Detection exploiting Separable Convolutions and Angular Loss" …
☆11Nov 6, 2024Updated last year