YuanGongND / cav-maeLinks

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

☆276

Alternatives and similar repositories for cav-mae

Users that are interested in cav-mae are comparing it to the libraries listed below

Sorting:

hche11 / VGGSound
VGGSound: A Large-scale Audio-Visual Dataset
☆341Updated 4 years ago
facebookresearch / AudioMAE
This repo hosts the code and models of "Masked Autoencoders that Listen".
☆628Updated last year
GeWu-Lab / awesome-audiovisual-learning
A curated list of audio-visual learning methods and datasets.
☆275Updated 11 months ago
facebookresearch / MAViL
The repo host the code and model of MAViL.
☆44Updated 2 years ago
XinhaoMei / WavCaps
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
☆251Updated last year
stoneMo / AVGN
Official implementation for AVGN
☆37Updated 2 years ago
stoneMo / DeepAVFusion
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
☆35Updated last year
speedyseal / audiosetdl
Scripts for download AudioSet
☆86Updated 8 years ago
GenjiB / LAVISH
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆106Updated 2 years ago
descriptinc / lyrebird-wav2clip
Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP
☆354Updated 3 years ago
dlrudco / Fast-Audioset-Download
Download audioset data super fastly with youtube-dl, ffmpeg and python multiprocessing
☆41Updated last year
hxixixh / mix-and-localize
☆21Updated last year
the-anonymous-bs / av-SALMONN
av-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
☆13Updated last year
YuanGongND / ssast
Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
☆404Updated 3 years ago
stoneMo / EZ-VSL
Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)
☆37Updated 3 years ago
akoepke / audio-retrieval-benchmark
Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".
☆52Updated 4 months ago
cdjkim / audiocaps
🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps
☆196Updated last month
stoneMo / SLAVC
Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)
☆20Updated 2 years ago
ahaliassos / raven
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆76Updated 8 months ago
cwx-worst-one / EAT
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
☆192Updated 4 months ago
ExplainableML / AVCA-GZSL
This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …
☆40Updated 2 years ago
YuanGongND / uavm
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
☆57Updated 2 years ago
pritamqu / CrissCross
[AAAI 2023 (Oral)] CrissCross: Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity
☆25Updated 2 years ago
nttcslab / msm-mae
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representations
☆96Updated last year
IFICL / stereocrw
Code for the Paper: [ECCV2022] Sound Localization by Self-Supervised Time-Delay Estimation
☆23Updated 2 years ago
YuanGongND / psla
Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".
☆149Updated 2 years ago
v-iashin / SparseSync
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
☆53Updated last year
AndreyGuzhov / ESResNeXt-fbsp
Source code for models described in the paper "ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio" (https://arxiv.o…
☆46Updated 4 years ago
ubc-vision / TriBERT
Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation" in NeurIPS…
☆14Updated 3 years ago
raymin0223 / patch-mix_contrastive_learning
Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification (INTERSPEECH 2023)
☆71Updated 8 months ago