v-iashin/SparseSync

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/v-iashin/SparseSync)

v-iashin / SparseSync

Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)

☆56

Alternatives and similar repositories for SparseSync

Users that are interested in SparseSync are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Franklin905 / VALOR
View on GitHub
Research code for NeurIPS 2023 paper "Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser"
☆17Jul 13, 2025Updated last year
vskadandale / vocalist
View on GitHub
Official repository for the paper VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
☆73Apr 7, 2024Updated 2 years ago
ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆35Feb 11, 2026Updated 5 months ago
PeihaoChen / regnet
View on GitHub
Official PyTorch implementation of the TIP paper "Generating Visually Aligned Sound from Videos" and the corresponding Visually Aligned S…
☆53Dec 15, 2020Updated 5 years ago
XYPB / CondFoleyGen
View on GitHub
Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".
☆93Dec 8, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tavihalperin / AV-sync
View on GitHub
Python implementation of the paper " Dynamic Temporal Alignment of Speech to Lips"
☆32May 16, 2019Updated 7 years ago
hche11 / Localizing-Visual-Sounds-the-Hard-Way
View on GitHub
Localizing Visual Sounds the Hard Way
☆84Jul 6, 2022Updated 4 years ago
IFICL / SLfM
View on GitHub
Official code for the paper: [ICCV2023] Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
☆43Updated this week
GeWu-Lab / awesome-audiovisual-learning
View on GitHub
A curated list of audio-visual learning methods and datasets.
☆288Dec 3, 2024Updated last year
epic-kitchens / epic-sounds-annotations
View on GitHub
Splits for epic-sounds dataset
☆85Aug 2, 2025Updated 11 months ago
ariesssxu / vta-ldm
View on GitHub
☆61Jun 15, 2025Updated last year
florianHofherr / PhysParamInference
View on GitHub
☆19Jan 30, 2023Updated 3 years ago
JeongHun0716 / vsr-low
View on GitHub
Visual Speech Recognition For Low-Resource Languages with Automatic Labels (ICASSP 2024)
☆17Mar 17, 2025Updated last year
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
itsyoavshalev / End-to-End-Lip-Synchronization-with-a-Temporal-AutoEncoder
View on GitHub
☆22Mar 31, 2022Updated 4 years ago
zcxu-eric / Ego4d_TalkNet_ASD
View on GitHub
☆21Feb 15, 2022Updated 4 years ago
GeWu-Lab / LFAV
View on GitHub
Towards Long Form Audio-visual Video Understanding
☆15Jan 16, 2026Updated 6 months ago
jasonppy / FaST-VGS-Family
View on GitHub
Transformer-based visually grounded speech models
☆19Sep 22, 2022Updated 3 years ago
visipedia / ssw60
View on GitHub
Sapsucker Woods 60 Audiovisual Dataset
☆19Oct 7, 2022Updated 3 years ago
WikiChao / DAVIS
View on GitHub
[🏆 IJCV 2025 & ACCV 2024 Best Paper Honorable Mention] Official pytorch implementation of the paper "High-Quality Visually-Guided Sound …
☆33Mar 30, 2026Updated 3 months ago
WangHelin1997 / MaskSpec
View on GitHub
The Pytorch implementation of paper: Masked Spectrogram Prediction For Self-Supervised Audio Pre-Training
☆51Dec 17, 2024Updated last year
v-iashin / SpecVQGAN
View on GitHub
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
☆372Jul 12, 2024Updated 2 years ago
speedyseal / audiosetdl
View on GitHub
Scripts for download AudioSet
☆89Nov 7, 2017Updated 8 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
andrewowens / multisensory
View on GitHub
Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
☆225Jul 17, 2019Updated 7 years ago
GATECH-EIC / S3-Router
View on GitHub
[NeurIPS 2022] "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Spee…
☆17Sep 19, 2023Updated 2 years ago
umbertocappellazzo / Llama-AVSR
View on GitHub
Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…
☆64Jan 18, 2026Updated 6 months ago
zfang399 / AlignNet
View on GitHub
AlignNet: A Unifying Approach to Audio-Visual Alignment (WACV 2020)
☆34Jan 10, 2021Updated 5 years ago
v-iashin / Synchformer
View on GitHub
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
☆130Sep 15, 2025Updated 10 months ago
UoA-CARES-Student / TalkingFaceGeneration-with-Emotion
View on GitHub
Talking Face Generation system
☆17Oct 16, 2023Updated 2 years ago
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
TengdaHan / ActionClassification
View on GitHub
Video action classification benchmark for common CNN architectures, implemented in PyTorch
☆12Jan 31, 2022Updated 4 years ago
cvlab-columbia / expert
View on GitHub
Code for Learning to Learn Language from Narrated Video
☆33Oct 3, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
uark-cviu / Right2Talk
View on GitHub
[ICCV'21] The Right to Talk: An Audio-Visual Transformer Approach
☆20Aug 2, 2021Updated 4 years ago
SunnyCYC / aug4mss
View on GitHub
☆13Dec 17, 2025Updated 7 months ago
DanielMengLiu / AudioVisualLip
View on GitHub
☆25Feb 20, 2024Updated 2 years ago
GenjiB / LAVISH
View on GitHub
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆107Aug 11, 2023Updated 2 years ago
yzxing87 / Seeing-and-Hearing
View on GitHub
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
☆155Jul 6, 2024Updated 2 years ago
sony / CLIPSep
View on GitHub
☆43Feb 21, 2023Updated 3 years ago
fuankarion / active-speakers-context
View on GitHub
Code for the Active Speakers in Context Paper (CVPR2020)
☆58May 19, 2021Updated 5 years ago