stoneMo/DeepAVFusion

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/stoneMo/DeepAVFusion)

stoneMo / DeepAVFusion

Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".

☆43

Alternatives and similar repositories for DeepAVFusion

Users that are interested in DeepAVFusion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

stoneMo / OneAVM
View on GitHub
Official Codebase of "A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition" (ICML 2023)
☆12Jun 1, 2023Updated 3 years ago
stoneMo / EZ-VSL
View on GitHub
Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)
☆41Oct 2, 2022Updated 3 years ago
stoneMo / SLAVC
View on GitHub
Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)
☆21Dec 6, 2022Updated 3 years ago
YYX666660 / LAVSS
View on GitHub
Code for LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
☆19Feb 25, 2025Updated last year
denfed / heartheflow
View on GitHub
Repository for the 2023 WACV paper: "Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization"
☆12Dec 21, 2022Updated 3 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
stoneMo / AVGN
View on GitHub
Official implementation for AVGN
☆41Mar 24, 2023Updated 3 years ago
YuanGongND / cav-mae
View on GitHub
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
☆292Mar 20, 2024Updated 2 years ago
OpenNLPLab / FNAC_AVL
View on GitHub
[CVPR 2023] Official implementation of our paper - Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learnin…
☆29Apr 10, 2023Updated 3 years ago
pedro-morgado / AVSpatialAlignment
View on GitHub
☆31Jun 14, 2022Updated 4 years ago
ttgeng233 / UniAV
View on GitHub
Unified Audio-Visual Perception for Multi-Task Video Localization
☆33Apr 19, 2024Updated 2 years ago
QinYang12 / SVGC-AVA
View on GitHub
☆14Aug 17, 2024Updated last year
hxixixh / mix-and-localize
View on GitHub
☆23Mar 20, 2024Updated 2 years ago
zhshj0110 / Awesome-Motion-Diffusion-Models
View on GitHub
A collection of resources and papers on Motion Diffusion Models.
☆39Jun 10, 2025Updated last year
xiazhaoqiang / MULT-MicroExpressionSpot
View on GitHub
This code is referring to a deep model that is used for micro-expression spotting with a CNN backbone and Transformer neck.
☆23Aug 2, 2023Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆35Feb 11, 2026Updated 5 months ago
kaiw7 / STG-CMA
View on GitHub
Towards Efficient Audio-Visual Learners via Empowering Pre-trained Vision Transformers with Cross-Modal Adaptation
☆15Apr 13, 2024Updated 2 years ago
lavendery / AudioComposer
View on GitHub
☆27Sep 10, 2025Updated 10 months ago
vvvb-github / AVSegFormer
View on GitHub
[AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer
☆74Mar 6, 2025Updated last year
weimengting / Preprocessing-of-Micro-Expressions
View on GitHub
Pipeline for the preprocessing of Micro-Expressions.
☆12Apr 27, 2026Updated 2 months ago
WangHelin1997 / SpecAugment-plus
View on GitHub
A Pytorch implementation of the paper : SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification
☆34Jun 25, 2021Updated 5 years ago
hahaluluyo / Micro-expression-recognition-with-supervised-contrastive-learning
View on GitHub
Micro-Expression Recognition based on Supervised Contrastive Learning
☆12Mar 27, 2024Updated 2 years ago
jinxiang-liu / anno-free-AVS
View on GitHub
Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"
☆38Oct 11, 2024Updated last year
SenHe / uavdvsm
View on GitHub
☆15Nov 23, 2020Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
lzhangbj / ASVA
View on GitHub
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
☆60Mar 15, 2026Updated 4 months ago
Bizilizi / VGGSounder
View on GitHub
VGGSounder, a multi-label audio-visual classification dataset with modality annotations.
☆17Jun 30, 2026Updated 3 weeks ago
SUDERS / MAWNO
View on GitHub
This repo is the relevant code for MAWNO
☆18Apr 21, 2024Updated 2 years ago
zeroone-universe / TowardsRobustSpeechSR
View on GitHub
Unofficial Pytorch Lightning Implementation of "Towards Robust Speech Super-Resolution"
☆10May 8, 2023Updated 3 years ago
yannqi / COMBO-AVS
View on GitHub
[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…
☆40Apr 20, 2025Updated last year
rikeilong / Bay-CAT
View on GitHub
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…
☆59Sep 4, 2024Updated last year
IFICL / SLfM
View on GitHub
Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
☆43Updated this week
fdbtrs / Unsupervised-Face-Recognition-using-Unlabeled-Synthetic-Data
View on GitHub
Unsupervised Face Recognition using Unlabeled Synthetic Data
☆26Mar 14, 2023Updated 3 years ago
lijuncheng16 / AudioTaggingDoneRight
View on GitHub
experiments about AudioSet
☆43Jul 22, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
XinleiRen / MTFAA-Net
View on GitHub
An unofficial non-causal Tensorflow implementation of "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Spee…
☆14Dec 27, 2022Updated 3 years ago
haoyi-duan / DG-SCT
View on GitHub
NeurIPS'2023 official implementation code
☆70Nov 11, 2023Updated 2 years ago
StevenHickson / CreateNormals
View on GitHub
☆11Nov 22, 2019Updated 6 years ago
roex-audio / roex-python
View on GitHub
A Python package for advanced audio processing, enabling users to mix, master, and apply sound engineering techniques with ease
☆21May 15, 2026Updated 2 months ago
GuanghaoZhu663 / SKD-TSTSAN
View on GitHub
☆23May 8, 2025Updated last year
gyx-gloria / DMT
View on GitHub
Official Implementation of DMT: Dual Mean-Teacher in PyTorch.
☆10Oct 27, 2023Updated 2 years ago
nianfd / RWKV-VG
View on GitHub
☆10Dec 3, 2024Updated last year