kaist-ami / Sound2SceneLinks

☆37

Alternatives and similar repositories for Sound2Scene

Users that are interested in Sound2Scene are comparing it to the libraries listed below

Sorting:

BurakCanBiner / SonicDiffusion
☆38Updated last year
ku-vai / TPoS
This repository is for The Power of Sound(TPoS): Audio Reactive Video Generation with Stable Diffusion (ICCV2023)
☆24Updated last year
lzhangbj / ASVA
[ECCV 2024 Oral] Audio-Synchronized Visual Animation
☆57Updated last year
yzxing87 / Seeing-and-Hearing
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
☆151Updated last year
kaist-ami / SoundBrush
☆10Updated 7 months ago
schowdhury671 / melfusion
☆58Updated last year
Sindhu-Hegde / jegal
Official code for the paper "Understanding Co-speech Gestures in-the-wild"
☆19Updated 2 weeks ago
kaist-ami / SMILE-Dataset
[NAACL'24] Repository for "SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models"
☆14Updated last year
x360dataset / x360dataset-kit
☆30Updated 4 months ago
Tinglok / avstyle
Codebase for the Paper: Learning Visual Styles from Audio-Visual Associations (ECCV 2022, in PyTorch)
☆15Updated 2 years ago
litwellchi / MMTrail
[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
☆33Updated 9 months ago
Minglu58 / TA2V
☆15Updated 10 months ago
aspirinone / CATR.github.io
☆31Updated last year
npurson / fid-metrics
A toolkit for computing Fréchet Inception Distance (FID) & Fréchet Video Distance (FVD) metrics.
☆39Updated 5 months ago
WikiChao / DAVIS
[🏆 IJCV 2025 & ACCV 2024 Best Paper Honorable Mention] Official pytorch implementation of the paper "High-Quality Visually-Guided Sound …
☆22Updated 2 weeks ago
HyelinNAM / ContrastiveDenoisingScore
[CVPR2024] Official PyTorch implementation of "Contrastive Denoising Score(CDS) for Text-guided Latent Diffusion Image Editing"
☆116Updated last year
lyndonzheng / CVQ-VAE
[ICCV 2023] Online Clustered Codebook
☆180Updated last year
shim0114 / SSM-Meets-Video-Diffusion-Models
☆48Updated 8 months ago
rxtan2 / AVSeT
☆17Updated 2 years ago
L-YeZhu / CDCD
[ICLR2023] Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation (CDCD).
☆162Updated 2 years ago
scofield7419 / Dysen
CVPR 24 paper: Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
☆14Updated last year
videodreamer23 / videodreamer23.github.io
☆29Updated 2 years ago
Jyxarthur / shot-by-shot
[ICCV 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda H…
☆20Updated 3 months ago
Neur-IO / OptVQ
Towards training VQ-VAE models robustly!
☆86Updated 4 months ago
ThomasMrY / EncDiff
[NeurIPS 2024 Spotlight] code for "Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement"
☆15Updated 9 months ago
JingyuanYY / EmoGen
This is the official implementation of 2024 CVPR paper "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models".
☆88Updated 2 weeks ago
luosiallen / Diff-Foley
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
☆196Updated last year
cvlab-kaist / DirecT2V
☆80Updated 2 years ago
ChanganVR / action2sound
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
☆25Updated last year
researchmm / MM-Diffusion
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
☆446Updated last year