Repository for "Training Audio Captioning Models without Audio"
☆10Sep 26, 2023Updated 2 years ago
Alternatives and similar repositories for NoAudioCaptioning
Users that are interested in NoAudioCaptioning are comparing it to the libraries listed below
Sorting:
- Code for the paper: MACE: Leveraging Audio for Evaluating Audio Captioning Systems☆13Jan 16, 2025Updated last year
- PAM is a no-reference audio quality metric for audio generation tasks☆77Jul 19, 2024Updated last year
- ☆11Sep 25, 2024Updated last year
- ☆10Oct 16, 2025Updated 4 months ago
- Audio Entailment: Deductive Reasoning for Audio Understanding☆17Dec 10, 2024Updated last year
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"☆31Dec 6, 2023Updated 2 years ago
- Official repo for the STRFNet system appeared in INTERSPEECH2020☆12Mar 6, 2021Updated 4 years ago
- [ICASSP'23] Online speaker clustering☆17Updated this week
- ☆17Oct 16, 2018Updated 7 years ago
- ☆18Mar 13, 2024Updated last year
- ☆34Jun 9, 2025Updated 8 months ago
- ☆37Jul 4, 2024Updated last year
- ☆50Apr 13, 2025Updated 10 months ago
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆50Oct 23, 2025Updated 4 months ago
- ☆22Mar 19, 2025Updated 11 months ago
- official implementation of MGA-CLAP (ACM MM 2024)☆30Oct 25, 2024Updated last year
- music semantic understanding evaluation benchmark☆25Aug 12, 2023Updated 2 years ago
- ☆20Mar 12, 2025Updated 11 months ago
- This repository contains all the code necessary for running the multilingual distilwhisper from Ferraz et al. 2024 IEEE ICASSP paper.☆33Oct 23, 2025Updated 4 months ago
- PHO-LID: A Unified Model to Incorporate Acoustic-Phonetic and Phonotactic Information for Language Identification☆21Aug 24, 2023Updated 2 years ago
- Official Implementation of EnCLAP (ICASSP 2024)☆94Jun 2, 2024Updated last year
- Implementation of MathReader, Text-to-Speech for Mathematical Documents☆27Sep 23, 2025Updated 5 months ago
- ☆68Dec 30, 2025Updated 2 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆86Jan 4, 2026Updated last month
- ☆33Dec 23, 2025Updated 2 months ago
- small audio language model for reasoning☆86Dec 4, 2025Updated 2 months ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated 11 months ago
- ☆24Sep 10, 2025Updated 5 months ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆32Mar 4, 2025Updated 11 months ago
- Tracking states of the arts and recent results (bibliography) on sound tasks.☆32Jan 10, 2023Updated 3 years ago
- Prediction of sound event bounding boxes (SEBBs)☆32Aug 2, 2024Updated last year
- A MATLAB app to interactively navigate Ryze Tello drone, read navigation data, process image data and produce equivalent MATLAB code. Thi…☆13Oct 22, 2025Updated 4 months ago
- SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)☆63Dec 23, 2025Updated 2 months ago
- ☆32Nov 24, 2024Updated last year
- A list of resources that can help in research for automated audio captioning☆34Feb 17, 2021Updated 5 years ago
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 8 months ago
- MuChoMusic is a benchmark for evaluating music understanding in multimodal audio-language models.☆44Dec 3, 2024Updated last year
- Qualtric or Qualtreat? Generate Qualtrics listening tests for Text-To-Speech evaluations.☆36Jun 25, 2024Updated last year
- Artificial Intelligence based model for classifying different wood species based on specific vibration characteristics.☆10Mar 24, 2021Updated 4 years ago