wzk1015/Awesome-Vision-to-Music-Generation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wzk1015/Awesome-Vision-to-Music-Generation)

wzk1015 / Awesome-Vision-to-Music-Generation

[ISMIR 2025] A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.

☆126

Alternatives and similar repositories for Awesome-Vision-to-Music-Generation

Users that are interested in Awesome-Vision-to-Music-Generation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chouliuzuo / GVMGen
View on GitHub
☆32Nov 10, 2025Updated 8 months ago
Apple-jun / FilmComposer
View on GitHub
Music production for silent film clips.
☆34Apr 30, 2025Updated last year
zhuole1025 / SymMV
View on GitHub
[ICCV 2023] Video Background Music Generation: Dataset, Method and Evaluation
☆78Mar 29, 2024Updated 2 years ago
ZeyueT / VidMuse
View on GitHub
[CVPR 2025] Repository of VidMuse
☆140Jun 7, 2025Updated last year
NKU-HLT / AudioEditor
View on GitHub
☆47Apr 2, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ETH-DISCO / blap
View on GitHub
Official repo for BLAP: Bootstrapping Language-Audio Pre-training for Music Captioning presented at ICASSP 2025
☆16Nov 18, 2024Updated last year
Stability-AI / stable-audio-metrics
View on GitHub
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
☆300Updated this week
Xiaohao-Liu / Awesome-Vison2Audio
View on GitHub
A curated list of Vision (video/image) to Audio Generation
☆107Feb 10, 2026Updated 5 months ago
nicolaus625 / FM4Music
View on GitHub
The official GitHub page for the survey paper "Foundation Models for Music: A Survey".
☆224Sep 4, 2024Updated last year
Amshaker / Mobile-VideoGPT
View on GitHub
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
☆142Aug 6, 2025Updated 11 months ago
lsfhuihuiff / Dance-to-music_Siggraph_Asia_2024
View on GitHub
The official code for “Dance-to-Music Generation with Encoder-based Textual Inversion“
☆23Jun 17, 2025Updated last year
justivanr / art2mus_
View on GitHub
Art2Mus is a system that generates music based on digitized artworks and text by using the AudioLDM2 architecture with an added projectio…
☆20Oct 20, 2025Updated 9 months ago
music-x-lab / Self-Supervised-Metrical-Structure
View on GitHub
☆15Sep 20, 2023Updated 2 years ago
heng-hw / V2A-Mapper
View on GitHub
[AAAI 2024] V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
☆29Dec 14, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
seungheondoh / lp-music-caps
View on GitHub
LP-MusicCaps: LLM-Based Pseudo Music Captioning [ISMIR23]
☆348Apr 8, 2024Updated 2 years ago
Hannieliao / Baton
View on GitHub
Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"
☆32Mar 4, 2025Updated last year
snap-research / GenAU
View on GitHub
☆53Mar 24, 2026Updated 4 months ago
sudongtan / synesthesia
View on GitHub
☆13Oct 3, 2023Updated 2 years ago
Sreyan88 / CompA
View on GitHub
Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
☆23Jul 10, 2024Updated 2 years ago
fundwotsai2001 / AP-adapter
View on GitHub
Audio Prompt Adapter: Unleashing music editing abilities for text-to-music with lightweight finetuning [ISMIR 2024]
☆57Nov 10, 2025Updated 8 months ago
RS2002 / PianoBart
View on GitHub
[ICME 2024 oral] Official Repository for The Paper, PianoBART: Symbolic Piano Music Understanding and Generating with Large-Scale Pre-Tra…
☆23Aug 17, 2025Updated 11 months ago
CarlWangChina / MuChin
View on GitHub
MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music
☆27Jan 7, 2026Updated 6 months ago
microsoft / fadtk
View on GitHub
A simple library for Fréchet Audio Distance (FAD) calculation
☆266Aug 22, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
bernardo-torres / linear-autoencoders
View on GitHub
Official code and pretrained models for Linear Consistency Autoencoders (Lin-CAE), a method to induce linearity in audio autoencoders via…
☆17Feb 12, 2026Updated 5 months ago
DragonLiu1995 / video-to-audio-through-text
View on GitHub
[NeurIPS 2024] Code, Dataset, Samples for the VATT paper “ Tell What You Hear From What You See - Video to Audio Generation Through Text”
☆38Jul 24, 2025Updated last year
ldzhangyx / instruct-MusicGen
View on GitHub
The official implementation of our paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tu…
☆109Jan 14, 2026Updated 6 months ago
hkchengrex / av-benchmark
View on GitHub
Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs…
☆80Feb 14, 2026Updated 5 months ago
HanxunH / AudioMosaic
View on GitHub
[ICML2026] AudioMosaic: Contrastive Masked Audio Representation Learning
☆23May 15, 2026Updated 2 months ago
bytedance / Make-An-Audio-2
View on GitHub
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
☆197May 29, 2024Updated 2 years ago
sanderwood / clamp3
View on GitHub
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages [ACL 2025]
☆250May 11, 2025Updated last year
OpenGVLab / FluxViT
View on GitHub
Make Your Training Flexible: Towards Deployment-Efficient Video Models
☆40Jun 11, 2025Updated last year
zxxwxyyy / sonique
View on GitHub
Video Background Music Generation Using Unpaired Audio-Visual Data
☆33Oct 8, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NVIDIA / audio-flamingo
View on GitHub
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
☆1,164Dec 15, 2025Updated 7 months ago
ilpoviertola / V-AURA
View on GitHub
The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)
☆35Feb 11, 2026Updated 5 months ago
Sreyan88 / ReCLAP
View on GitHub
☆33Dec 23, 2025Updated 7 months ago
Pliploop / SLAP
View on GitHub
Official repository for the paper - SLAP: Siamese Language-Audio Pretraining without negative samples for Music Understanding
☆63Sep 25, 2025Updated 10 months ago
OpenGVLab / LORIS
View on GitHub
[ICML2023] Long-Term Rhythmic Video Soundtracker
☆63Jul 28, 2025Updated last year
a43992899 / openl2s
View on GitHub
Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.
☆17May 9, 2025Updated last year
tencent-ailab / MuQ
View on GitHub
Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".
☆362Aug 4, 2025Updated 11 months ago