ExplainableML/ZerAuCap

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ExplainableML/ZerAuCap)

ExplainableML / ZerAuCap

[NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language model guidance and audio context keywords

☆19

Alternatives and similar repositories for ZerAuCap

Users that are interested in ZerAuCap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ExplainableML / TCAF-GZSL
View on GitHub
This repository contains the code for our ECCV 2022 paper "Temporal and cross-modal attention for audio-visual zero-shot learning"
☆25Sep 12, 2025Updated 10 months ago
ExplainableML / CLEVR-X
View on GitHub
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
☆30Oct 27, 2023Updated 2 years ago
felixgontier / dcase-2023-baseline
View on GitHub
☆14Mar 25, 2023Updated 3 years ago
andrebola / contrastive-mir-learning
View on GitHub
This repo contains the code to reproduce the paper: "Enriched Music Representations with Multiple Cross-modal Contrastive Learning"
☆15Jun 22, 2023Updated 3 years ago
chrysts / generative_preconditioner
View on GitHub
☆11Oct 8, 2020Updated 5 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
stephane-rivaud / ForwardLocalGradient
View on GitHub
This is the official implementation of the ICML 2023 paper - Can Forward Gradient Match Backpropagation ?
☆13May 31, 2023Updated 3 years ago
shlizee / Audeo
View on GitHub
☆31Feb 4, 2021Updated 5 years ago
robaru / sofamyroom
View on GitHub
Room acoustic simulator with a SOFA file loader.
☆24Sep 27, 2024Updated last year
ExplainableML / ImageFreeZSL
View on GitHub
☆18Oct 5, 2024Updated last year
desh2608 / kaldi-noise-vectors
View on GitHub
Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.
☆13Feb 13, 2021Updated 5 years ago
oncescuandreea / audio-retrieval
View on GitHub
Implementation of "Audio Retrieval with Natural Language Queries", INTERSPEECH 2021, PyTorch
☆26Aug 18, 2023Updated 2 years ago
sinbycosmay / Computer-Graphics-Project
View on GitHub
☆14May 25, 2021Updated 5 years ago
aispeech-lab / TinyWASE
View on GitHub
PyTorch implementation of TinyWASE described in our paper "Compressing Speaker Extraction Model with Ultra-low Precision Quantization and…
☆11Jun 28, 2021Updated 5 years ago
Labbeti / aac-metrics
View on GitHub
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
☆75Mar 22, 2026Updated 3 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
akoepke / audio-retrieval-benchmark
View on GitHub
Code for "Audio Retrieval with Natural Language Queries: A Benchmark Study", Transactions on Multimedia 2022
☆54Jul 16, 2025Updated last year
v-iashin / Synchformer
View on GitHub
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
☆130Sep 15, 2025Updated 10 months ago
crypto-code / Music-Representation-Comparison
View on GitHub
This is the repo with the code to conduct a comparative analysis of different audio representation models.
☆11Aug 31, 2023Updated 2 years ago
LiDCC / MERTech
View on GitHub
Official code of ICASSP 2024 paper "MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Tas…
☆11Jun 14, 2024Updated 2 years ago
Text-to-Audio / Make-An-Audio-3
View on GitHub
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
☆121May 19, 2025Updated last year
Veleslavia / conditioned-u-net
View on GitHub
Conditioned U-Net for Music Source Separation
☆20May 15, 2021Updated 5 years ago
wolfparticle / lee-nlp_asr2020
View on GitHub
主要参考李宏毅老师2020年人类语言处理课程资料整理，包括代码和ppt
☆34May 25, 2021Updated 5 years ago
jbeliao / SLAM
View on GitHub
☆16Sep 12, 2019Updated 6 years ago
ICASSP2021-tutorial9 / Distant_conversational_ASR_and_analysis
View on GitHub
☆12Jun 10, 2021Updated 5 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
cstein163 / SwinUnetUnetlikePureTransformerforMedicalImageSegmentation-CodeReroduction
View on GitHub
It is a very simple code reproduction
☆13Apr 22, 2022Updated 4 years ago
wdqqdw / Echo
View on GitHub
Project page of "2026-ICLR Echo: Towards Advanced Audio Comprehension via Audio-Interleaved Reasoning"
☆16Mar 26, 2026Updated 3 months ago
keunwoochoi / music4all_contrib
View on GitHub
☆32Dec 29, 2020Updated 5 years ago
risclite / rv32m-multiplier-and-divider
View on GitHub
a multiplier&divider verilog RTL file for RV32M instructions
☆14Mar 17, 2020Updated 6 years ago
hendriks73 / directional_cnns
View on GitHub
Source code repository for the SMC paper "Musical Tempo and Key Estimation using Convolutional Neural Networks with Directional Filters".
☆33Mar 24, 2023Updated 3 years ago
assafmu / wav2letter_pytorch
View on GitHub
An implementation of the Wav2Letter Speech-to-Text model using PyTorch.
☆14Mar 8, 2023Updated 3 years ago
abdfahim / audioprocessing
View on GitHub
Standard libraries for audio processing, especially STFT and Spherical Harmonics decomposition of a soundfield.
☆10Nov 29, 2021Updated 4 years ago
seungheondoh / speech-to-music
View on GitHub
Textless Speech-to-Music Retrieval Using Emotion Similarity [ICASSP23]
☆17Aug 16, 2023Updated 2 years ago
drscotthawley / fad_pytorch
View on GitHub
Frechet Audio Distance evaluation in PyTorch
☆36Jun 9, 2023Updated 3 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
emirdemirel / ASA_ICASSP2021
View on GitHub
A duration-invariant audio-to-lyrics alignment pipeline with low memory footprint which segments long music recordings via a recursive bi…
☆15Oct 13, 2022Updated 3 years ago
shansongliu / HumTrans
View on GitHub
☆13Sep 26, 2023Updated 2 years ago
d3n7 / riffusionPrepper
View on GitHub
Prepare spectrograms from audio for training a Riffusion model
☆16Mar 6, 2023Updated 3 years ago
isack-ml / LatentGaze
View on GitHub
☆18Sep 22, 2022Updated 3 years ago
dengandong / Videos-Publications-Collection
View on GitHub
This is a collection of publications about videos.
☆18Apr 29, 2021Updated 5 years ago
yongyizang / SingFake
View on GitHub
Official Repository for "SingFake: Singing Voice Deepfake Detection"
☆64Feb 26, 2024Updated 2 years ago
xmed-lab / AdaCon
View on GitHub
IEEE TMI 2021: AdaCon: Adaptive Contrast for Image Regression in Computer-Aided Disease Assessment
☆21Mar 21, 2022Updated 4 years ago