ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.
☆64Nov 18, 2021Updated 4 years ago
Alternatives and similar repositories for acav100m
Users that are interested in acav100m are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The source code of ExFunTube☆10Aug 8, 2025Updated 8 months ago
- ☆43Feb 21, 2023Updated 3 years ago
- Visual search interface☆11Nov 30, 2021Updated 4 years ago
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.☆257Jul 25, 2024Updated last year
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)☆118Sep 15, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"☆25Dec 14, 2025Updated 4 months ago
- ☆19Aug 19, 2021Updated 4 years ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆57Mar 27, 2025Updated last year
- Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018☆208Apr 3, 2021Updated 5 years ago
- Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".☆54Jul 16, 2025Updated 9 months ago
- Official implementation for AVGN☆41Mar 24, 2023Updated 3 years ago
- The repo for "Class-aware Sounding Objects Localization", TPAMI 2021.☆29Mar 4, 2022Updated 4 years ago
- ☆117Mar 24, 2026Updated 3 weeks ago
- Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)☆90Jul 25, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A curated list of audio-visual learning methods and datasets.☆287Dec 3, 2024Updated last year
- Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)☆56Jan 29, 2024Updated 2 years ago
- The dataset and baseline code for Text-to-Audio Grounding (TAG)☆50Oct 23, 2025Updated 5 months ago
- ESPER☆24Mar 29, 2024Updated 2 years ago
- A dataset for Audio-Visual Sound Event Detection in Movies☆26Jan 23, 2023Updated 3 years ago
- Official Pytorch Implementation for Continual Learning For On-Device Environmental Sound Classification☆14Jul 19, 2022Updated 3 years ago
- Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)☆21Dec 6, 2022Updated 3 years ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆34Mar 14, 2025Updated last year
- The original weights of some Caffe models, ported to PyTorch.☆11Jan 18, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- VGGSound: A Large-scale Audio-Visual Dataset☆357Sep 13, 2021Updated 4 years ago
- [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆300Mar 14, 2024Updated 2 years ago
- PodcastMix A dataset for separating music and speech in podcasts.☆44Aug 20, 2024Updated last year
- Audio captioning recipe☆52Oct 23, 2025Updated 5 months ago
- Pronunciation-assisted Subword Modeling☆31May 30, 2019Updated 6 years ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆200Dec 13, 2024Updated last year
- Official implementation for MGN☆20Dec 22, 2022Updated 3 years ago
- ☆12Nov 22, 2022Updated 3 years ago
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆15Jun 23, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆130Apr 7, 2026Updated last week
- Unsupervised spoken sentence embeddings☆14Dec 14, 2022Updated 3 years ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆49Jan 19, 2026Updated 3 months ago
- Spherical residual vector quantization (SRVQ)☆31Aug 25, 2024Updated last year
- Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)☆16Oct 12, 2021Updated 4 years ago
- This repository contains the code for our ECCV 2022 paper "Temporal and cross-modal attention for audio-visual zero-shot learning"☆25Sep 12, 2025Updated 7 months ago
- Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".☆290Mar 20, 2024Updated 2 years ago