ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.
☆64Nov 18, 2021Updated 4 years ago
Alternatives and similar repositories for acav100m
Users that are interested in acav100m are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The source code of ExFunTube☆10Aug 8, 2025Updated 7 months ago
- ☆43Feb 21, 2023Updated 3 years ago
- Visual search interface☆11Nov 30, 2021Updated 4 years ago
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.☆257Jul 25, 2024Updated last year
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)☆116Sep 15, 2025Updated 6 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆20Aug 19, 2021Updated 4 years ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆56Mar 27, 2025Updated last year
- Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018☆207Apr 3, 2021Updated 4 years ago
- Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".☆54Jul 16, 2025Updated 8 months ago
- ☆38Jul 4, 2024Updated last year
- Official implementation for AVGN☆40Mar 24, 2023Updated 3 years ago
- The repo for "Class-aware Sounding Objects Localization", TPAMI 2021.☆29Mar 4, 2022Updated 4 years ago
- Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)☆72Jan 4, 2026Updated 2 months ago
- ☆117Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)☆90Jul 25, 2024Updated last year
- A curated list of audio-visual learning methods and datasets.☆286Dec 3, 2024Updated last year
- Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)☆55Jan 29, 2024Updated 2 years ago
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆50Oct 23, 2025Updated 5 months ago
- ESPER☆24Mar 29, 2024Updated 2 years ago
- A dataset for Audio-Visual Sound Event Detection in Movies☆26Jan 23, 2023Updated 3 years ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated last year
- Official Pytorch Implementation for Continual Learning For On-Device Environmental Sound Classification☆14Jul 19, 2022Updated 3 years ago
- Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)☆20Dec 6, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- VGGSound: A Large-scale Audio-Visual Dataset☆355Sep 13, 2021Updated 4 years ago
- [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆298Mar 14, 2024Updated 2 years ago
- PodcastMix A dataset for separating music and speech in podcasts.☆44Aug 20, 2024Updated last year
- Audio captioning recipe☆51Oct 23, 2025Updated 5 months ago
- ☆29Jul 4, 2025Updated 8 months ago
- Official implementation for MGN☆20Dec 22, 2022Updated 3 years ago
- ☆12Nov 22, 2022Updated 3 years ago
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆15Jun 23, 2024Updated last year
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆124Mar 18, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆49Jan 19, 2026Updated 2 months ago
- Unsupervised spoken sentence embeddings☆14Dec 14, 2022Updated 3 years ago
- Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)☆17Oct 12, 2021Updated 4 years ago
- Jax implementation of VIT-VQGAN☆10Jan 25, 2024Updated 2 years ago
- Audio Captioning datasets for PyTorch.☆128Updated this week
- The code repo for ICASSP 2023 Paper "MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning"☆25May 18, 2023Updated 2 years ago
- Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".☆288Mar 20, 2024Updated 2 years ago