MorenoLaQuatra/audiocaps-download

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MorenoLaQuatra/audiocaps-download)

MorenoLaQuatra / audiocaps-download

This package aims at simplifying the download of the AudioCaps dataset.

☆35

Alternatives and similar repositories for audiocaps-download

Users that are interested in audiocaps-download are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

cdjkim / audiocaps
View on GitHub
🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps
☆215Oct 6, 2025Updated 9 months ago
MorenoLaQuatra / audioset-download
View on GitHub
This package aims at simplifying the download of the AudioSet dataset.
☆60Jul 17, 2025Updated last year
XinhaoMei / WavCaps
View on GitHub
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
☆264Jul 25, 2024Updated 2 years ago
MorenoLaQuatra / vad
View on GitHub
Simple voice activity detection (VAD) algorithm in Python
☆15Aug 10, 2023Updated 2 years ago
wdqqdw / Echo
View on GitHub
Project page of "2026-ICLR Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning"
☆16Mar 26, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
happylittlecat2333 / Auffusion
View on GitHub
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…
☆194Mar 25, 2024Updated 2 years ago
Xia-aaa / L3former
View on GitHub
☆14Jun 26, 2025Updated last year
dkavargy / ESCOPlus2.0
View on GitHub
An open-source framework designed to extend and maintain the ESCO taxonomy
☆18Mar 12, 2026Updated 4 months ago
speedyseal / audiosetdl
View on GitHub
Scripts for download AudioSet
☆89Nov 7, 2017Updated 8 years ago
haoheliu / audioldm_eval
View on GitHub
This toolbox aims to unify audio generation model evaluation for easier comparison.
☆390Sep 29, 2024Updated last year
MorenoLaQuatra / bart-it
View on GitHub
Pre-training BART model for the Italian Language
☆16Dec 28, 2022Updated 3 years ago
audio-captioning / caption-evaluation-tools
View on GitHub
Tools for the evaluation of audio captioning.
☆19May 23, 2020Updated 6 years ago
wonjune-kang / llm-speech-summarization
View on GitHub
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
☆20May 14, 2025Updated last year
PeihaoChen / regnet
View on GitHub
Official PyTorch implementation of the TIP paper "Generating Visually Aligned Sound from Videos" and the corresponding Visually Aligned S…
☆53Dec 15, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
seungheondoh / msu-benchmark
View on GitHub
music semantic understanding evaluation benchmark
☆24Aug 12, 2023Updated 2 years ago
Labbeti / aac-datasets
View on GitHub
Audio Captioning datasets for PyTorch.
☆129Mar 25, 2026Updated 4 months ago
umbertocappellazzo / Llama-AVSR
View on GitHub
Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…
☆64Jan 18, 2026Updated 6 months ago
zhaoyanpeng / vipant
View on GitHub
VIsually-Pivoted Audio and(N) Text
☆22May 16, 2022Updated 4 years ago
audio-captioning / clotho-dataset
View on GitHub
Python code for handling the Clotho dataset.
☆85Nov 24, 2020Updated 5 years ago
XinhaoMei / ACT
View on GitHub
Source code for the paper 'Audio Captioning Transformer'
☆56Jan 18, 2022Updated 4 years ago
LoieSun / Auto-ACD
View on GitHub
code for A Large-scale Dataset for Audio-Language Representation Learning
☆14Sep 18, 2024Updated last year
LAION-AI / audio-dataset
View on GitHub
Audio Dataset for training CLAP and other models
☆748Jan 8, 2026Updated 6 months ago
Sakshi113 / MMAU
View on GitHub
☆156Feb 9, 2026Updated 5 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ariesssxu / vta-ldm
View on GitHub
☆61Jun 15, 2025Updated last year
lukewys / dcase_2020_T6
View on GitHub
2nd place solution for 2020 DCASE challenge task 6 audio captioning. http://dcase.community/challenge2020/task-automatic-audio-captioning…
☆24Aug 3, 2023Updated 2 years ago
MU94W / TTS-Eval
View on GitHub
☆18Aug 9, 2018Updated 7 years ago
microsoft / CLAP
View on GitHub
Learning audio concepts from natural language supervision
☆673Sep 18, 2024Updated last year
BriansIDP / AudioVisualLLM
View on GitHub
☆19May 19, 2024Updated 2 years ago
emo-box / EmoBox
View on GitHub
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
☆321Mar 18, 2026Updated 4 months ago
Hannieliao / Baton
View on GitHub
Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"
☆32Mar 4, 2025Updated last year
NIRALUser / DTIPlayground
View on GitHub
An integrated framework for DWI Image QC and processing
☆13Mar 9, 2026Updated 4 months ago
hhhaaahhhaa / ASR-TTA
View on GitHub
☆16Nov 4, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
LAION-AI / CLAP
View on GitHub
Contrastive Language-Audio Pretraining
☆2,234May 15, 2025Updated last year
XYPB / CondFoleyGen
View on GitHub
Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".
☆93Dec 8, 2023Updated 2 years ago
seungheondoh / msd-subsets
View on GitHub
million song dataset split for extended clean tag & artist-level stratified
☆52Aug 12, 2023Updated 2 years ago
luosiallen / Diff-Foley
View on GitHub
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
☆206May 29, 2024Updated 2 years ago
gwh22 / LAFMA
View on GitHub
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)
☆44Jun 13, 2024Updated 2 years ago
Ming-er / MGA-CLAP
View on GitHub
official implementation of MGA-CLAP (ACM MM 2024)
☆29Oct 25, 2024Updated last year
jaeyeonkim99 / EnCLAP
View on GitHub
Official Implementation of EnCLAP (ICASSP 2024)
☆96Jun 2, 2024Updated 2 years ago