v-iashin/Synchformer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/v-iashin/Synchformer)

v-iashin / Synchformer

Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)

☆110

Alternatives and similar repositories for Synchformer

Users that are interested in Synchformer are comparing it to the libraries listed below

Sorting:

ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆33Feb 11, 2026Updated 3 weeks ago
luosiallen / Diff-Foley
View on GitHub
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
☆200May 29, 2024Updated last year
ariesssxu / vta-ldm
View on GitHub
☆62Jun 15, 2025Updated 8 months ago
cyanbx / Frieren-V2A
View on GitHub
Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)
☆59Apr 3, 2025Updated 11 months ago
naver-ai / rewas
View on GitHub
Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"
☆43Dec 13, 2024Updated last year
Ceaglex / LoVA
View on GitHub
The code and weight for LoVA. LoVA is a novel model for Long-form Video-to-Audio generation. Based on the Diffusion Transformer (DiT) arc…
☆15Feb 27, 2025Updated last year
hkchengrex / av-benchmark
View on GitHub
Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs…
☆59Feb 14, 2026Updated 3 weeks ago
PolyPerceiver-Lab / STAV2A
View on GitHub
☆19Aug 11, 2025Updated 6 months ago
heng-hw / V2A-Mapper
View on GitHub
[AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
☆27Dec 14, 2023Updated 2 years ago
yzxing87 / Seeing-and-Hearing
View on GitHub
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
☆155Jul 6, 2024Updated last year
XYPB / CondFoleyGen
View on GitHub
Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".
☆93Dec 8, 2023Updated 2 years ago
v-iashin / SparseSync
View on GitHub
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
☆54Jan 29, 2024Updated 2 years ago
lzhangbj / ASVA
View on GitHub
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
☆61Sep 12, 2024Updated last year
mdx-workshop / mdx-submissions21
View on GitHub
Music Demixing Challenge Submission Repo
☆15Sep 8, 2023Updated 2 years ago
ku-vai / TPoS
View on GitHub
This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)
☆25Dec 7, 2023Updated 2 years ago
snap-research / AVLink
View on GitHub
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
☆16Aug 3, 2025Updated 7 months ago
Stability-AI / stable-audio-metrics
View on GitHub
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
☆284Jan 30, 2026Updated last month
Sreyan88 / ReCLAP
View on GitHub
☆33Dec 23, 2025Updated 2 months ago
bytedance / Make-An-Audio-2
View on GitHub
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
☆188May 29, 2024Updated last year
eloimoliner / unconditional-diff-STFT
View on GitHub
Unconditional music synthesis using a diffusion model in the STFT domain
☆12May 31, 2022Updated 3 years ago
sangho-vision / acav100m
View on GitHub
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.
☆63Nov 18, 2021Updated 4 years ago
soham97 / PAM
View on GitHub
PAM is a no-reference audio quality metric for audio generation tasks
☆76Jul 19, 2024Updated last year
ZeyueT / VidMuse
View on GitHub
☆117Jun 7, 2025Updated 9 months ago
open-mmlab / FoleyCrafter
View on GitHub
[IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝
☆643Jul 26, 2024Updated last year
jaeyeonkim99 / EnCLAP
View on GitHub
Official Implementation of EnCLAP (ICASSP 2024)
☆94Jun 2, 2024Updated last year
Sreyan88 / CompA
View on GitHub
Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
☆22Jul 10, 2024Updated last year
Ego4DSounds / Ego4DSounds
View on GitHub
Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
☆19Jun 14, 2024Updated last year
facebookresearch / FlowDec
View on GitHub
An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.
☆198Jul 14, 2025Updated 7 months ago
InternLM / StarBench
View on GitHub
[ICLR 2026] An official implementation of "STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence"
☆40Jan 17, 2026Updated last month
ExplainableML / ZerAuCap
View on GitHub
[NeurIPS 2023 - ML for Audio Workshop (Oral)] Zero-shot audio captioning with audio-language model guidance and audio context keywords
☆18Nov 30, 2024Updated last year
PeiwenSun2000 / Both-Ears-Wide-Open
View on GitHub
The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation
☆60Jul 2, 2025Updated 8 months ago
hkchengrex / MMAudio
View on GitHub
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
☆2,096Feb 23, 2026Updated last week
Text-to-Audio / Make-An-Audio-3
View on GitHub
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
☆119May 19, 2025Updated 9 months ago
ms-dot-k / TMT
View on GitHub
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
☆18May 23, 2024Updated last year
RoySheffer / im2wav
View on GitHub
Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation
☆125Jan 18, 2023Updated 3 years ago
JishengBai / ICME2024ASC
View on GitHub
baseline for IEEE ICME 2024 GC: Semi-supervised Acoustic Scene Classification under Domain Shift
☆18Mar 16, 2024Updated last year
zeyuxie29 / AudioTime
View on GitHub
☆37Jul 4, 2024Updated last year
happylittlecat2333 / Auffusion
View on GitHub
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…
☆193Mar 25, 2024Updated last year
snap-research / GenAU
View on GitHub
☆50Apr 13, 2025Updated 10 months ago