edsonroteia / cav-mae-syncLinks

[CVPR25] Official Implementation of CAV-MAE Sync

☆25

Alternatives and similar repositories for cav-mae-sync

Users that are interested in cav-mae-sync are comparing it to the libraries listed below

Sorting:

mhamilton723 / DenseAV
Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
☆83Updated last year
ahaliassos / usr
Official implementation of USR (NeurIPS 2024)
☆35Updated 9 months ago
schowdhury671 / meerkat
☆33Updated 3 months ago
kaistmm / SSLalignment
☆35Updated 4 months ago
guyyariv / AudioToken
This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …
☆85Updated last year
JacobChalk / TIM
Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"
☆46Updated 11 months ago
stoneMo / MGN
Official implementation for MGN
☆20Updated 2 years ago
haoyi-duan / DG-SCT
NeurIPS'2023 official implementation code
☆66Updated last year
GenjiB / LAVISH
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆103Updated 2 years ago
TengdaHan / AutoAD
[CVPR'23 Highlight] AutoAD: Movie Description in Context.
☆100Updated 11 months ago
YuanGongND / cav-mae
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
☆271Updated last year
ChanganVR / action2sound
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
☆25Updated last year
stoneMo / DeepAVFusion
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
☆34Updated last year
LoieSun / Auto-ACD
code for A Large-scale Dataset for Audio-Language Representation Learning
☆14Updated last year
v-iashin / SparseSync
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
☆53Updated last year
BriansIDP / AudioVisualLLM
☆19Updated last year
ahaliassos / raven
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆70Updated 7 months ago
facebookresearch / MAViL
The repo host the code and model of MAViL.
☆44Updated 2 years ago
stoneMo / AVGN
Official implementation for AVGN
☆37Updated 2 years ago
rxtan2 / AVSeT
☆17Updated 2 years ago
rikeilong / Bay-CAT
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…
☆56Updated last year
roudimit / c2kd
Code for the C2KD paper (ICASSP 2023)
☆18Updated 2 years ago
lzhangbj / ASVA
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
☆56Updated last year
YuanGongND / uavm
Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".
☆55Updated 2 years ago
v-iashin / Synchformer
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
☆90Updated last month
stoneMo / EZ-VSL
Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)
☆37Updated 3 years ago
Ego4DSounds / Ego4DSounds
Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
☆18Updated last year
epic-kitchens / epic-sounds-annotations
Splits for epic-sounds dataset
☆83Updated 2 months ago
GeWu-Lab / PSTP-Net
☆17Updated 2 years ago
ilpoviertola / V-AURA
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆29Updated 9 months ago