guyyariv / TempoTokensLinks

This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

☆128

Alternatives and similar repositories for TempoTokens

Users that are interested in TempoTokens are comparing it to the libraries listed below

Sorting:

luosiallen / Diff-Foley
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
☆198Updated last year
happylittlecat2333 / Auffusion
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generati…
☆190Updated last year
XYPB / CondFoleyGen
Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".
☆91Updated last year
guyyariv / AudioToken
This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …
☆87Updated last year
v-iashin / Synchformer
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
☆95Updated 2 months ago
ariesssxu / vta-ldm
☆62Updated 5 months ago
lzhangbj / ASVA
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
☆58Updated last year
bytedance / Make-An-Audio-2
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
☆180Updated last year
yzxing87 / Seeing-and-Hearing
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
☆152Updated last year
ZeyueT / VidMuse
☆105Updated 5 months ago
sizhelee / Diff-BGM
official code for CVPR'24 paper Diff-BGM
☆71Updated last year
naver-ai / rewas
Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"
☆44Updated 11 months ago
HilaManor / AudioEditingCode
☆183Updated 2 weeks ago
DavidMChan / Anim400K
Anim-400K: A dataset designed from the ground up for automated dubbing of video
☆110Updated last year
heng-hw / V2A-Mapper
[AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
☆26Updated last year
Text-to-Audio / Make-An-Audio-3
Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers
☆113Updated 6 months ago
ilpoviertola / V-AURA
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆31Updated 11 months ago
schowdhury671 / melfusion
☆58Updated last year
OpenGVLab / LORIS
[ICML2023] Long-Term Rhythmic Video Soundtracker
☆61Updated 4 months ago
RoySheffer / im2wav
Official implementation of the pipeline presented in I hear your true colors: Image Guided Audio Generation
☆124Updated 2 years ago
zhuole1025 / SymMV
[ICCV 2023] Video Background Music Generation: Dataset, Method and Evaluation
☆77Updated last year
mhamilton723 / DenseAV
Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
☆85Updated last year
snap-research / GenAU
☆43Updated 7 months ago
GalaxyCong / HPMDubbing
[CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.
☆110Updated last year
glory20h / VoiceLDM
VoiceLDM: Text-to-Speech with Environmental Context
☆188Updated last year
choijeongsoo / lip2speech-unit
[Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units
☆47Updated last year
ChanganVR / action2sound
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
☆25Updated last year
AMAAI-Lab / Video2Music
Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model
☆188Updated last year
cyanbx / Frieren-V2A
Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)
☆55Updated 8 months ago
thu-ml / Bridge-TTS
Official codebase for "Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis" (https://arxiv.org/abs/2312.03491).
☆128Updated last year