researchmm/MM-Diffusion

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/researchmm/MM-Diffusion)

researchmm / MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

☆453

Alternatives and similar repositories for MM-Diffusion

Users that are interested in MM-Diffusion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sihyun-yu / PVDM
View on GitHub
[CVPR'23] Video Probabilistic Diffusion Models in Projected Latent Space
☆322May 14, 2024Updated 2 years ago
nihaomiao / CVPR23_LFDM
View on GitHub
The pytorch implementation of our CVPR 2023 paper "Conditional Image-to-Video Generation with Latent Flow Diffusion Models"
☆471Jun 18, 2024Updated 2 years ago
mzsun01 / MM-LDM
View on GitHub
☆11Apr 12, 2024Updated 2 years ago
XYPB / CondFoleyGen
View on GitHub
Official PyTorch implementation of "Conditional Generation of Audio from Video via Foley Analogies".
☆93Dec 8, 2023Updated 2 years ago
lucidrains / video-diffusion-pytorch
View on GitHub
Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch
☆1,384May 3, 2024Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
guyyariv / TempoTokens
View on GitHub
[AAAI 2024] The official PyTorch implementation of "Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation"
☆130May 18, 2026Updated 2 months ago
AndreyGuzhov / AudioCLIP
View on GitHub
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
☆872Sep 30, 2021Updated 4 years ago
yzxing87 / Seeing-and-Hearing
View on GitHub
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
☆155Jul 6, 2024Updated 2 years ago
cdjkim / audiocaps
View on GitHub
🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps
☆215Oct 6, 2025Updated 9 months ago
ali-vilab / videocomposer
View on GitHub
Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability
☆958Nov 11, 2023Updated 2 years ago
YingqingHe / LVDM
View on GitHub
LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation
☆503Nov 16, 2024Updated last year
mayuelala / FollowYourPose
View on GitHub
[AAAI 2024] Follow-Your-Pose: This repo is the official implementation of "Follow-Your-Pose : Pose-Guided Text-to-Video Generation using …
☆1,356Mar 20, 2024Updated 2 years ago
showlab / Awesome-Video-Diffusion
View on GitHub
A curated list of recent diffusion models for video generation, editing, and various other applications.
☆5,733Updated this week
DanBigioi / DiffusionVideoEditing
View on GitHub
Official project repo for paper "Speech Driven Video Editing via an Audio-Conditioned Diffusion Model"
☆228Jun 30, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
voletiv / mcvd-pytorch
View on GitHub
Official implementation of MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation (https://arxiv.org/abs/…
☆370Sep 22, 2022Updated 3 years ago
CASIA-IVA-Lab / VALOR
View on GitHub
[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
☆311Dec 25, 2024Updated last year
L-YeZhu / CDCD
View on GitHub
[ICLR2023] Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation (CDCD).
☆163Apr 5, 2023Updated 3 years ago
YBYBZhang / ControlVideo
View on GitHub
[ICLR 2024] Official pytorch implementation of "ControlVideo: Training-free Controllable Text-to-Video Generation"
☆863Oct 12, 2023Updated 2 years ago
ChenHsing / Awesome-Video-Diffusion-Models
View on GitHub
[CSUR] A Survey on Video Diffusion Models
☆2,304Jun 22, 2026Updated last month
thuhcsi / S2G-MDDiffusion
View on GitHub
☆134Jul 8, 2024Updated 2 years ago
JIA-Lab-research / Video-P2P
View on GitHub
Video-P2P: Video Editing with Cross-attention Control
☆431Jun 30, 2025Updated last year
Picsart-AI-Research / Text2Video-Zero
View on GitHub
[ICCV 2023 Oral] Text-to-Image Diffusion Models are Zero-Shot Video Generators
☆4,246May 6, 2023Updated 3 years ago
ziqihuangg / Collaborative-Diffusion
View on GitHub
[CVPR 2023] Collaborative Diffusion
☆441Oct 7, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
haoheliu / AudioLDM
View on GitHub
AudioLDM: Generate speech, sound effects, music and beyond, with text.
☆2,908Jun 25, 2025Updated last year
stoneMo / AVGN
View on GitHub
Official implementation for AVGN
☆42Mar 24, 2023Updated 3 years ago
luosiallen / Diff-Foley
View on GitHub
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
☆205May 29, 2024Updated 2 years ago
sukun1045 / video-physics-sound-diffusion
View on GitHub
☆49Jul 10, 2024Updated 2 years ago
OpenNLPLab / TAVGBench
View on GitHub
Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation
☆15Apr 7, 2025Updated last year
YuanGongND / cav-mae
View on GitHub
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
☆292Mar 20, 2024Updated 2 years ago
Advocate99 / DiffGesture
View on GitHub
[CVPR'2023] Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
☆265Mar 18, 2026Updated 4 months ago
guyyariv / AudioToken
View on GitHub
[InterSpeech 2023] The official PyTorch implementation of: "AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Imag…
☆89May 18, 2026Updated 2 months ago
AILab-CVC / VideoCrafter
View on GitHub
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
☆5,068Jan 9, 2026Updated 6 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Weifeng-Chen / control-a-video
View on GitHub
Official Implementation of "Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models"
☆404Jul 4, 2023Updated 3 years ago
ChenyangQiQi / FateZero
View on GitHub
[ICCV 2023 Oral] "FateZero: Fusing Attentions for Zero-shot Text-based Video Editing"
☆1,163Aug 14, 2023Updated 2 years ago
yhw-yhw / TalkSHOW
View on GitHub
This is the official repository for TalkSHOW: Generating Holistic 3D Human Motion from Speech [CVPR2023].
☆371Nov 1, 2023Updated 2 years ago
ExponentialML / Text-To-Video-Finetuning
View on GitHub
Finetune ModelScope's Text To Video model using Diffusers 🧨
☆699Dec 14, 2023Updated 2 years ago
gmkim-ai / Diffusion-Video-Autoencoders
View on GitHub
An official implementation of "Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encod…
☆152Oct 18, 2023Updated 2 years ago
LAION-AI / CLAP
View on GitHub
Contrastive Language-Audio Pretraining
☆2,229May 15, 2025Updated last year
thu-ml / unidiffuser
View on GitHub
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
☆1,486May 31, 2023Updated 3 years ago