hkchengrex/MMAudio

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hkchengrex/MMAudio)

hkchengrex / MMAudio

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

☆2,241

Alternatives and similar repositories for MMAudio

Users that are interested in MMAudio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kijai / ComfyUI-MMAudio
View on GitHub
☆574Feb 1, 2026Updated 5 months ago
declare-lab / TangoFlux
View on GitHub
[ICLR 2026] TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching
☆876Jan 28, 2026Updated 5 months ago
open-mmlab / FoleyCrafter
View on GitHub
[IJCV 2026] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝
☆658Jun 15, 2026Updated last month
FunAudioLLM / ThinkSound
View on GitHub
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for generating audio from any modality, guided by Chain-of-Tho…
☆1,373Apr 3, 2026Updated 3 months ago
v-iashin / Synchformer
View on GitHub
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
☆130Sep 15, 2025Updated 10 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ZeyueT / AudioX
View on GitHub
[ICLR 2026] Repository of AudioX
☆1,542Mar 10, 2026Updated 4 months ago
multimodal-art-projection / YuE
View on GitHub
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
☆6,332Jun 4, 2025Updated last year
bytedance / LatentSync
View on GitHub
Taming Stable Diffusion for Lip Sync!
☆5,895Jun 20, 2025Updated last year
Tencent-Hunyuan / HunyuanVideo
View on GitHub
HunyuanVideo: A Systematic Framework For Large Video Generation Model
☆12,351Jun 29, 2026Updated 3 weeks ago
FunAudioLLM / FunMusic
View on GitHub
A fundamental toolkit designed for music, song, and audio generation
☆1,369May 20, 2025Updated last year
ali-vilab / VACE
View on GitHub
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
☆3,870Oct 17, 2025Updated 9 months ago
Stability-AI / stable-audio-tools
View on GitHub
Generative models for conditional audio generation
☆3,818Jul 13, 2026Updated last week
Fantasy-AMAP / fantasy-talking
View on GitHub
[ACM MM 2025] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
☆1,623Jan 26, 2026Updated 5 months ago
ASLP-lab / DiffRhythm
View on GitHub
Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
☆2,321Nov 27, 2025Updated 7 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
luosiallen / Diff-Foley
View on GitHub
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
☆205May 29, 2024Updated 2 years ago
SWivid / F5-TTS
View on GitHub
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
☆14,981Jul 5, 2026Updated 2 weeks ago
NUS-HPC-AI-Lab / Enhance-A-Video
View on GitHub
Enhance-A-Video: Better Generated Video for Free
☆598Mar 17, 2025Updated last year
ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆35Feb 11, 2026Updated 5 months ago
Lightricks / LTX-Video
View on GitHub
Official repository for LTX-Video
☆10,710Jan 5, 2026Updated 6 months ago
kijai / ComfyUI-HunyuanVideoWrapper
View on GitHub
☆2,596Aug 20, 2025Updated 11 months ago
SkyworkAI / SkyReels-V1
View on GitHub
SkyReels V1: The first and most advanced open-source human-centric video foundation model
☆2,692Mar 10, 2025Updated last year
cyanbx / Frieren-V2A
View on GitHub
Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)
☆62Apr 3, 2025Updated last year
Phantom-video / Phantom
View on GitHub
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
☆1,511Sep 11, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
logtd / ComfyUI-LTXTricks
View on GitHub
A set of ComfyUI nodes providing additional control for the LTX Video model
☆515Mar 5, 2025Updated last year
Lightricks / ComfyUI-LTXVideo
View on GitHub
LTX-Video Support for ComfyUI
☆3,963Jun 30, 2026Updated 3 weeks ago
ace-step / ACE-Step
View on GitHub
ACE-Step: A Step Towards Music Generation Foundation Model
☆4,670Feb 15, 2026Updated 5 months ago
haidog-yaqub / EzAudio
View on GitHub
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
☆333Dec 17, 2025Updated 7 months ago
aigc-apps / EasyAnimate
View on GitHub
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
☆2,267Mar 6, 2025Updated last year
MeiGen-AI / MultiTalk
View on GitHub
[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
☆2,968May 22, 2026Updated last month
zai-org / CogVideo
View on GitHub
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
☆12,891Nov 4, 2025Updated 8 months ago
bytedance / Make-An-Audio-2
View on GitHub
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
☆197May 29, 2024Updated 2 years ago
tdrussell / diffusion-pipe
View on GitHub
A pipeline parallel training script for diffusion models.
☆1,997Jun 29, 2026Updated 3 weeks ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
genmoai / mochi
View on GitHub
The best OSS video generation models, created by Genmo
☆3,698Nov 14, 2025Updated 8 months ago
Tencent / MimicMotion
View on GitHub
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
☆2,628Nov 18, 2025Updated 8 months ago
lllyasviel / FramePack
View on GitHub
Lets make video diffusion practical!
☆17,119Oct 16, 2025Updated 9 months ago
Wan-Video / Wan2.1
View on GitHub
Wan: Open and Advanced Large-Scale Video Generative Models
☆16,608Mar 5, 2026Updated 4 months ago
ivcylc / OpenMusic
View on GitHub
OpenMusic: SOTA Text-to-music (TTM) Generation
☆630Jun 26, 2025Updated last year
antgroup / echomimic_v2
View on GitHub
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
☆4,617Feb 23, 2026Updated 4 months ago
hkchengrex / av-benchmark
View on GitHub
Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs…
☆79Feb 14, 2026Updated 5 months ago