Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation: A framework for generating multimodal music by bridging different representations and enhancing generation with RAG.
☆28Jan 21, 2025Updated last year
Alternatives and similar repositories for MTM
Users that are interested in MTM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Controllable Group Choreography using Contrastive Diffusion (SIGGRAPH ASIA 2023)☆18Nov 25, 2025Updated 4 months ago
- official code for CVPR'24 paper Diff-BGM☆71Oct 12, 2024Updated last year
- ISMIR 24 Supplementary Material☆14Oct 28, 2024Updated last year
- Repo for the IDESSAI 2024 course on modeling audio with discrete tokens.☆13Sep 13, 2024Updated last year
- Repo of the paper "Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model""☆15Jun 28, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Humos paper repository☆26Sep 6, 2025Updated 6 months ago
- "Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification" ISMIR2025☆35Sep 11, 2025Updated 6 months ago
- ☆23Dec 10, 2024Updated last year
- KMM: Key Frame Mask Mamba for Extended Motion Generation☆19Sep 22, 2025Updated 6 months ago
- ☆20May 7, 2025Updated 10 months ago
- This is the official repository of Emotion-Driven Melody Harmonization via Melodic Variation and Functional Representation.☆12Sep 25, 2024Updated last year
- The ArtificialSongGenerator automatically composes and compiles the Artifical Audio Multitrack dataset (AAM).☆27Nov 17, 2025Updated 4 months ago
- ☆55Jul 16, 2025Updated 8 months ago
- MuChoMusic is a benchmark for evaluating music understanding in multimodal audio-language models.☆44Dec 3, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆60Apr 11, 2024Updated last year
- My Master's Project, a function/system/program that gives the structure of a given song (The pattern of repetition of verse, chorus, etc.…☆14Jun 21, 2019Updated 6 years ago
- ☆23Aug 5, 2025Updated 7 months ago
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"☆22Dec 8, 2024Updated last year
- Codebase for ICLR' 23 paper- ''wav2tok: Deep Sequence Tokenizer for Audio Retrieval"☆36Feb 10, 2026Updated last month
- a notebook containing scripts, documentation, and examples for finetuning musicgen☆99Apr 10, 2024Updated last year
- [ICME 2025] DiffusionTalker: Efficient and Compact Speech-Driven 3D Talking Head via Personalizer-Guided Distillation☆24Mar 25, 2025Updated last year
- [ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images?☆21Oct 20, 2025Updated 5 months ago
- The official PyTorch implementation of "The 18th European Conference on Computer Vision" (ECCV 2024) paper Length-Aware Motion Synthesis …☆20Dec 15, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”☆56Jun 1, 2025Updated 9 months ago
- The official code for “Dance-to-Music Generation with Encoder-based Textual Inversion“☆22Jun 17, 2025Updated 9 months ago
- ScorePerformer: Expressive Piano Performance Rendering with Fine-Grained Control (ISMIR 2023)☆41Mar 10, 2025Updated last year
- Music Generation in MIDI format using Deep Learning.☆17Jun 22, 2024Updated last year
- This is the accompanying repository to the paper - Automatic Estimation of Singing Voice Musical Dynamics☆15Oct 28, 2024Updated last year
- Demo for DART, Audio Imagination workshop submission in NeurIPS 2024☆13Apr 15, 2025Updated 11 months ago
- arPLS algorithm from "Baseline correction using asymmetrically reweighted penalized least squares smoothing"☆15Mar 25, 2022Updated 4 years ago
- [ICMR 2025] Official Repository for The Paper, Let Network Decide What to Learn: Symbolic Music Understanding Model Based on Large-scale …☆18Aug 17, 2025Updated 7 months ago
- \infty-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation☆19Feb 14, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A library for computing Frechet Music Distance.☆29Feb 4, 2025Updated last year
- CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models [NAACL 2025]☆60Feb 28, 2025Updated last year
- Salesforce AI Research's open diffusion language model☆59Oct 29, 2025Updated 4 months ago
- ☆57Oct 10, 2024Updated last year
- [Official Implementation] Acoustic Autoregressive Modeling 🔥☆75Aug 24, 2024Updated last year
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆24Oct 22, 2025Updated 5 months ago
- TheGlueNote is representation model for note-wise music alignment.☆12Jul 19, 2024Updated last year