postech-ami / Sound2Scene

☆32

Alternatives and similar repositories for Sound2Scene:

Users that are interested in Sound2Scene are comparing it to the libraries listed below

BurakCanBiner / SonicDiffusion
☆28Updated 3 months ago
lzhangbj / ASVA
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
☆44Updated 5 months ago
yzxing87 / Seeing-and-Hearing
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
☆137Updated 7 months ago
guyyariv / AudioToken
This repo contains the official PyTorch implementation of AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image …
☆80Updated 8 months ago
schowdhury671 / melfusion
☆45Updated 4 months ago
postech-ami / SMILE-Dataset
[NAACL'24] Repository for "SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models"
☆12Updated 8 months ago
v-iashin / Synchformer
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
☆44Updated 2 weeks ago
ku-vai / TPoS
This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)
☆23Updated last year
naver-ai / rewas
Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"
☆35Updated 2 months ago
jacklishufan / InstructAny2Pix
PyTorch implementation of InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
☆28Updated 3 weeks ago
XYPB / CondFoleyGen
Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".
☆82Updated last year
aspirinone / CATR.github.io
☆32Updated 11 months ago
Minglu58 / TA2V
☆16Updated 2 months ago
ilpoviertola / V-AURA
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025)
☆19Updated last month
stoneMo / DeepAVFusion
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
☆28Updated 6 months ago
litwellchi / MMTrail
[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
☆30Updated 2 weeks ago
JingyuanYY / EmoGen
This is the official implementation of 2024 CVPR paper "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models".
☆70Updated last month
ZeyueT / VidMuse
☆41Updated 2 months ago
JacobChalk / TIM
Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"
☆39Updated 3 months ago
GenjiB / LAVISH
Vision Transformers are Parameter-Efficient Audio-Visual Learners
☆97Updated last year
stoneMo / EZ-VSL
Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)
☆31Updated 2 years ago
scofield7419 / Dysen
CVPR 24 paper: Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
☆10Updated 11 months ago
markweberdev / maskbit
Implementation of the paper "MaskBit: Embedding-free Image Generation from Bit Tokens"
☆48Updated 3 weeks ago
hxixixh / mix-and-localize
☆20Updated 11 months ago
lsfhuihuiff / Dance-to-music_Siggraph_Asia_2024
The official code for “Dance-to-Music Generation with Encoder-based Textual Inversion“
☆18Updated 2 weeks ago
Tinglok / avstyle
Codebase for the Paper: Learning Visual Styles from Audio-Visual Associations (ECCV 2022, in PyTorch)
☆14Updated 2 years ago
viiika / Diffusion-Conductor
[AAAI 2023 Summer Symposium, Best Paper Award] Taming Diffusion Models for Music-driven Conducting Motion Generation
☆26Updated 9 months ago
HyelinNAM / ContrastiveDenoisingScore
[CVPR2024] Official PyTorch implementation of "Contrastive Denoising Score(CDS) for Text-guided Latent Diffusion Image Editing"
☆106Updated 3 months ago
YangLing0818 / ContextDiff
[ICLR 2024] Contextualized Diffusion Models for Text-Guided Image and Video Generation
☆62Updated 8 months ago
rikeilong / Bay-CAT
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…
☆48Updated 5 months ago