sangho-vision / acav100mView external linksLinks
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.
☆63Nov 18, 2021Updated 4 years ago
Alternatives and similar repositories for acav100m
Users that are interested in acav100m are comparing it to the libraries listed below
Sorting:
- Visual search interface☆11Nov 30, 2021Updated 4 years ago
- ☆20Aug 19, 2021Updated 4 years ago
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.☆256Jul 25, 2024Updated last year
- ☆43Feb 21, 2023Updated 2 years ago
- Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)☆107Sep 15, 2025Updated 5 months ago
- ☆37Jul 4, 2024Updated last year
- The Ecoacoustic Dataset from Arctic North Slope Alaska☆11May 29, 2025Updated 8 months ago
- PodcastMix A dataset for separating music and speech in podcasts.☆44Aug 20, 2024Updated last year
- ☆12Nov 22, 2022Updated 3 years ago
- ☆117Updated this week
- "Artificial General Intelligence For All (AGIFA)" Project☆12Feb 25, 2024Updated last year
- The original weights of some Caffe models, ported to PyTorch.☆11Jan 18, 2022Updated 4 years ago
- ☆29Jul 4, 2025Updated 7 months ago
- Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".☆54Jul 16, 2025Updated 7 months ago
- Image-source method for room acoustics☆14Feb 5, 2020Updated 6 years ago
- [ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach☆20Aug 2, 2021Updated 4 years ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated 11 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆48Jan 19, 2026Updated 3 weeks ago
- The repo for "Class-aware Sounding Objects Localization", TPAMI 2021.☆29Mar 4, 2022Updated 3 years ago
- Source code of the paper: Video Inpainting Localization with Contrastive Learning, IEEE SPL 2025.☆12Aug 9, 2025Updated 6 months ago
- Spherical residual vector quantization (SRVQ)☆31Aug 25, 2024Updated last year
- Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning☆16Jun 23, 2024Updated last year
- The source code of ExFunTube☆10Aug 8, 2025Updated 6 months ago
- ☆17May 31, 2023Updated 2 years ago
- Pronunciation-assisted Subword Modeling☆31May 30, 2019Updated 6 years ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆58Apr 17, 2024Updated last year
- A curated list of audio-visual learning methods and datasets.☆285Dec 3, 2024Updated last year
- Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018☆203Apr 3, 2021Updated 4 years ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆195Dec 13, 2024Updated last year
- Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)☆89Jul 25, 2024Updated last year
- ☆14Sep 4, 2020Updated 5 years ago
- 🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"☆24Dec 14, 2025Updated 2 months ago
- ☆15Updated this week
- Unsupervised spoken sentence embeddings☆14Dec 14, 2022Updated 3 years ago
- Unofficial PyTorch implementation of "SCNet: Sparse Compression Network for Music Source Separation"☆62Apr 14, 2024Updated last year
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 8 months ago
- [NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset☆298Mar 14, 2024Updated last year
- ☆37Jun 30, 2022Updated 3 years ago
- Official implementation for AVGN☆40Mar 24, 2023Updated 2 years ago