Source code for the paper 'Audio Captioning Transformer'
β57Jan 18, 2022Updated 4 years ago
Alternatives and similar repositories for ACT
Users that are interested in ACT are comparing it to the libraries listed below
Sorting:
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.β257Jul 25, 2024Updated last year
- π Repository for our NAACL-HLT 2019 paper: AudioCapsβ203Oct 6, 2025Updated 4 months ago
- Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'β50May 17, 2022Updated 3 years ago
- β11May 7, 2022Updated 3 years ago
- The open source code for LLM-Codecβ145Aug 18, 2024Updated last year
- Code for CVSSP submission to DCASE 2021 Task 6β36Nov 22, 2022Updated 3 years ago
- Official Implementation of EnCLAP (ICASSP 2024)β94Jun 2, 2024Updated last year
- Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".β54Jul 16, 2025Updated 7 months ago
- wake-up word emotion recognition [APSIPA 2022]β17Nov 11, 2022Updated 3 years ago
- β40Apr 2, 2025Updated 11 months ago
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervβ¦β38Jan 6, 2024Updated 2 years ago
- Official Implementation of "Prefix tuning for Automated Audio Captioning(ICASSP 2023)"β31Dec 6, 2023Updated 2 years ago
- β43Feb 21, 2023Updated 3 years ago
- Efficient Personalized Speech Enhancement through Self-Supervised Learningβ23Mar 12, 2023Updated 2 years ago
- speaker-disentangled speech linguistic content quantizerβ24Mar 19, 2025Updated 11 months ago
- The dataset and baseline code for Text-to-Audio Grounding (TAG)β50Oct 23, 2025Updated 4 months ago
- Official implementation for FlowSepβ70Jan 2, 2025Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipelineβ196Dec 13, 2024Updated last year
- Audio captioning recipeβ51Oct 23, 2025Updated 4 months ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech β¦β28Nov 7, 2025Updated 3 months ago
- β18May 4, 2025Updated 9 months ago
- Audio Captioning datasets for PyTorch.β126Jul 18, 2025Updated 7 months ago
- Incorporating AutoVocoder to MB-iSTFT-VITSβ48Dec 1, 2022Updated 3 years ago
- β36Sep 6, 2025Updated 5 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformersβ118May 19, 2025Updated 9 months ago
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.β32Jan 26, 2024Updated 2 years ago
- Official PyTorch implementation of (ICME2025 oral) "AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-β¦β16Feb 1, 2026Updated last month
- Inference code for Audiodec-Valle-Wenetspeech4TTSβ50Jul 14, 2024Updated last year
- Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".β469Apr 24, 2024Updated last year
- This repo hosts the code and model of "Separate What You Describe: Language-Queried Audio Source Separation", Interspeech 2022β145Oct 11, 2023Updated 2 years ago
- This toolbox aims to unify audio generation model evaluation for easier comparison.β376Sep 29, 2024Updated last year
- The latent diffusion model for text-to-music generation.β185Jan 26, 2024Updated 2 years ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986β48Jan 19, 2026Updated last month
- PyTorch Dataset for Speech and Music audioβ80Jul 12, 2024Updated last year
- Singing Voice Conversion Challenge 2023 Starter Kit: FastSVC Reimplementationβ116Nov 25, 2023Updated 2 years ago
- Reimplementation of Miipherβ29Aug 16, 2023Updated 2 years ago
- The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"β366Aug 3, 2023Updated 2 years ago
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioningβ16Jun 23, 2024Updated last year
- Repo of the paper "Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model""β15Jun 28, 2024Updated last year