sangho-vision/acav100m

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sangho-vision/acav100m)

sangho-vision / acav100m

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.

☆64

Alternatives and similar repositories for acav100m

Users that are interested in acav100m are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dayoon-ko / ExFunTube
View on GitHub
The source code of ExFunTube
☆10Aug 8, 2025Updated 11 months ago
sony / CLIPSep
View on GitHub
☆43Feb 21, 2023Updated 3 years ago
CompVis / visual-search
View on GitHub
Visual search interface
☆11Nov 30, 2021Updated 4 years ago
XinhaoMei / WavCaps
View on GitHub
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
☆264Jul 25, 2024Updated 2 years ago
voidful / llm-codec
View on GitHub
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
☆23May 3, 2026Updated 2 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
v-iashin / Synchformer
View on GitHub
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
☆130Sep 15, 2025Updated 10 months ago
dzryk / cliptalk
View on GitHub
☆19Aug 19, 2021Updated 4 years ago
YapengTian / AVE-ECCV18
View on GitHub
Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
☆210Apr 3, 2021Updated 5 years ago
akoepke / audio-retrieval-benchmark
View on GitHub
Code for "Audio Retrieval with Natural Language Queries: A Benchmark Study", Transactions on Multimedia 2022
☆54Jul 16, 2025Updated last year
zeyuxie29 / AudioTime
View on GitHub
☆39Jul 4, 2024Updated 2 years ago
stoneMo / AVGN
View on GitHub
Official implementation for AVGN
☆42Mar 24, 2023Updated 3 years ago
ttgeng233 / UnAV
View on GitHub
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
☆73Jan 4, 2026Updated 6 months ago
GeWu-Lab / CSOL_TPAMI2021
View on GitHub
The repo for "Class-aware Sounding Objects Localization", TPAMI 2021.
☆29Mar 4, 2022Updated 4 years ago
v-iashin / SparseSync
View on GitHub
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
☆56Jan 29, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆289Dec 3, 2024Updated last year
YapengTian / AVVP-ECCV20
View on GitHub
Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)
☆90Jul 25, 2024Updated 2 years ago
JiwanChung / esper
View on GitHub
ESPER
☆24Mar 29, 2024Updated 2 years ago
Bizilizi / VGGSounder
View on GitHub
VGGSounder, a multi-label audio-visual classification dataset with modality annotations.
☆17Jun 30, 2026Updated 3 weeks ago
usc-sail / mica-subtitle-aligned-movie-sounds
View on GitHub
A dataset for Audio-Visual Sound Event Detection in Movies
☆26Jan 23, 2023Updated 3 years ago
wsntxxn / TextToAudioGrounding
View on GitHub
The dataset and baseline code for Text-to-Audio Grounding (TAG)
☆49Oct 23, 2025Updated 9 months ago
LiveBaster / agifa
View on GitHub
"Artificial General Intelligence For All (AGIFA)" Project
☆12Feb 25, 2024Updated 2 years ago
swagshaw / ASC-CL
View on GitHub
Official Pytorch Implementation for Continual Learning For On-Device Environmental Sound Classification
☆14Jul 19, 2022Updated 4 years ago
crowsonkb / pytorch-caffe-models
View on GitHub
The original weights of some Caffe models, ported to PyTorch.
☆11Jan 18, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
stoneMo / SLAVC
View on GitHub
Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)
☆22Dec 6, 2022Updated 3 years ago
MTG / Podcastmix
View on GitHub
PodcastMix A dataset for separating music and speech in podcasts.
☆44Aug 20, 2024Updated last year
kuan2jiu99 / audio-hallucination
View on GitHub
Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024
☆34Mar 14, 2025Updated last year
zeyuxie29 / SemanticVocoder
View on GitHub
☆28Apr 6, 2026Updated 3 months ago
hche11 / VGGSound
View on GitHub
VGGSound: A Large-scale Audio-Visual Dataset
☆359Sep 13, 2021Updated 4 years ago
hainan-xv / PASM
View on GitHub
Pronunciation-assisted Subword Modeling
☆31May 30, 2019Updated 7 years ago
CASIA-IVA-Lab / VAST
View on GitHub
[NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
☆302Mar 14, 2024Updated 2 years ago
stoneMo / MGN
View on GitHub
Official implementation for MGN
☆20Dec 22, 2022Updated 3 years ago
JishengBai / AudioSetCaps
View on GitHub
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
☆208Dec 13, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wsntxxn / AudioCaption
View on GitHub
Audio captioning recipe
☆53Oct 23, 2025Updated 9 months ago
HS-YN / PanoAVQA
View on GitHub
Official repository of PanoAVQA: Grounded Audio-Visual Question Answering in 360° Videos (ICCV 2021)
☆16Oct 12, 2021Updated 4 years ago
GeWu-Lab / Crab
View on GitHub
[CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
☆85Dec 24, 2025Updated 7 months ago
lingjzhu / spoken_sent_embedding
View on GitHub
Unsupervised spoken sentence embeddings
☆14Dec 14, 2022Updated 3 years ago
qiuqiangkong / mini_llm
View on GitHub
☆29Jul 4, 2025Updated last year
gzhu06 / Cacophony
View on GitHub
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
☆49Jan 19, 2026Updated 6 months ago
jiasenlu / vit-vqgan-jax
View on GitHub
Jax implementation of VIT-VQGAN
☆10Jan 25, 2024Updated 2 years ago