LoieSun / Auto-ACD
code for A Large-scale Dataset for Audio-Language Representation Learning
β13Updated 7 months ago
Alternatives and similar repositories for Auto-ACD:
Users that are interested in Auto-ACD are comparing it to the libraries listed below
- Source code for the paper 'Audio Captioning Transformer'β54Updated 3 years ago
- π¦ Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)β46Updated 2 months ago
- The dataset and baseline code for Text-to-Audio Grounding (TAG)β42Updated 3 months ago
- Pytorch implementation for βV2C: Visual Voice Cloningββ32Updated 2 years ago
- Ego4DSounds: A diverse egocentric dataset with high action-audio correspondenceβ18Updated 10 months ago
- β40Updated 2 years ago
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapesβ30Updated this week
- [CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representationβ35Updated 7 months ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025)β22Updated 3 months ago
- small audio language model for reasoningβ58Updated last week
- β16Updated last year
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"β27Updated last month
- [Official Implementation] Acoustic Autoregressive Modeling π₯β67Updated 8 months ago
- Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)β9Updated last year
- β22Updated 6 months ago
- Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling anβ¦β33Updated last year
- β32Updated last month
- β34Updated 3 weeks ago
- Official implementation for AVGNβ34Updated 2 years ago
- This package aims at simplifying the download of the AudioCaps dataset.β33Updated last year
- Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)β17Updated 2 years ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)β51Updated last year
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conteβ¦β26Updated last month
- Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.β45Updated last week
- Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)β52Updated last year
- β62Updated last month
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fiβ¦β36Updated 8 months ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipelineβ126Updated 4 months ago
- Official source code of the INTERSPEECH 2023 paper: "Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Moβ¦β19Updated last year
- The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generationβ36Updated 2 weeks ago