[NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language model guidance and audio context keywords
☆18Nov 30, 2024Updated last year
Alternatives and similar repositories for ZerAuCap
Users that are interested in ZerAuCap are comparing it to the libraries listed below
Sorting:
- PyTorch implementation of TinyWASE described in our paper "Compressing Speaker Extraction Model with Ultra-low Precision Quantization and…☆11Jun 28, 2021Updated 4 years ago
- ☆11Oct 8, 2020Updated 5 years ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- This is the official implementation of the ICML 2023 paper - Can Forward Gradient Match Backpropagation ?☆13May 31, 2023Updated 2 years ago
- An implementation of the Wav2Letter Speech-to-Text model using PyTorch.☆14Mar 8, 2023Updated 2 years ago
- One-Shot Unsupervised Cross Domain Detection☆13Nov 22, 2022Updated 3 years ago
- ☆16Sep 12, 2019Updated 6 years ago
- ☆12Jun 10, 2021Updated 4 years ago
- This repository provides data and code for "Vox Populi, Vox DIY: Benchmark Dataset for Crowdsourced Audio Transcription" paper.☆16Jul 22, 2021Updated 4 years ago
- A duration-invariant audio-to-lyrics alignment pipeline with low memory footprint which segments long music recordings via a recursive bi…☆15Oct 13, 2022Updated 3 years ago
- The official implementation of DMEL the method presented in the paper "DMEL: The differentiable log-Mel spectrogram as a trainable layer …☆22Dec 21, 2024Updated last year
- ☆17Apr 14, 2023Updated 2 years ago
- A neural language modeling toolkit built on PyTorch☆19Mar 17, 2023Updated 2 years ago
- This repo contains the code to reproduce the paper: "Enriched Music Representations with Multiple Cross-modal Contrastive Learning"☆15Jun 22, 2023Updated 2 years ago
- A corpus of speech from the Joe Rogan Experience podcast, consisting of 8.43 million words. It includes aligned TextGrids with phonetic a…☆21Jan 26, 2020Updated 6 years ago
- ☆16Jun 13, 2022Updated 3 years ago
- ☆17Aug 27, 2025Updated 6 months ago
- Python implementation of CTC beam search decoder + agnostic LM scorer☆20Dec 16, 2020Updated 5 years ago
- BurrMill core☆22Nov 2, 2021Updated 4 years ago
- Room acoustic simulator with a SOFA file loader.☆23Sep 27, 2024Updated last year
- Baseline convolutional ASR system in PyTorch☆21Nov 16, 2023Updated 2 years ago
- ☆24Mar 13, 2020Updated 5 years ago
- Conditioned U-Net for Music Source Separation☆20May 15, 2021Updated 4 years ago
- Convert kaldi feature extraction and nnet3 models into Tensorflow Lite models. Currently aimed at converting kaldi's x-vector models and …☆20Oct 6, 2022Updated 3 years ago
- Data and code related to the ICASSP submission "A comparison of methods for OOV-word recognition"☆17Nov 28, 2021Updated 4 years ago
- Implementation for "SoundCLR: Contrastive Learning of Representations For Improved Environmental Sound Classification," in pytorch.☆28Jan 18, 2024Updated 2 years ago
- ☆21Aug 29, 2019Updated 6 years ago
- Voice100 includes neural TTS/ASR models. Inference of Voice100 is low cost as its models are tiny and only depend on CNN without autoregr…☆28Nov 23, 2023Updated 2 years ago
- Docker image and scripts for training finetuned or completely personal Kaldi speech models. Particularly for use with kaldi-active-gramma…☆21Jan 24, 2022Updated 4 years ago
- ☆21Sep 24, 2018Updated 7 years ago
- Convert words to numbers☆21Apr 13, 2022Updated 3 years ago
- Code for our ACML and INTERSPEECH papers: "Speaker Diarization as a Fully Online Bandit Learning Problem in MiniVox".☆29Sep 20, 2021Updated 4 years ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆118May 19, 2025Updated 9 months ago
- A collection of utilities for handling IPA phones.☆26Sep 24, 2023Updated 2 years ago
- Official Repository for "SingFake: Singing Voice Deepfake Detection"☆63Feb 26, 2024Updated 2 years ago
- Frechet Audio Distance evaluation in PyTorch☆36Jun 9, 2023Updated 2 years ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated 11 months ago
- Balanced Error Rate for Speaker Diarization☆33Feb 28, 2023Updated 3 years ago
- ☆32Jul 27, 2022Updated 3 years ago