TiffanyBlews / MozartsTouch
Official implementation of Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models
☆31Updated 3 weeks ago
Alternatives and similar repositories for MozartsTouch:
Users that are interested in MozartsTouch are comparing it to the libraries listed below
- Findings of ACL 2023 | AlignSTS: a speech-to-singing (STS) model based on modality disentanglement and cross-modal alignment☆67Updated 8 months ago
- Music generation☆24Updated 10 months ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆38Updated 9 months ago
- Robust Singing Voice Transcription and MIDI Extraction☆72Updated 4 months ago
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.☆32Updated last year
- ☆46Updated 2 months ago
- The source code for the paper XiaoiceSing2 (interspeech2023)☆48Updated last year
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆67Updated 4 months ago
- VAE modified from Descript Audio Codec, which replaces the RVQ with VAE☆66Updated 11 months ago
- ☆80Updated 4 months ago
- Implementation of RIFT-SVC, a singing voice conversion model based on Rectified Flow Transformer.☆36Updated last week
- Video Background Music Generation Using Unpaired Audio-Visual Data☆23Updated 5 months ago
- [ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion☆57Updated last month
- ☆18Updated 3 weeks ago
- small audio language model for reasoning☆49Updated last week
- ☆15Updated last month
- Bilingual Singing Voice Synthesis☆18Updated last year
- Implementation of Multi-Source Music Generation with Latent Diffusion.☆22Updated 6 months ago
- ☆26Updated 6 months ago
- Art2Mus is a system that generates music based on digitized artworks and text by using the AudioLDM2 architecture with an added projectio…☆15Updated 3 months ago
- ☆20Updated 5 months ago
- ☆38Updated 6 months ago
- ☆12Updated last year
- UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts☆22Updated 3 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆55Updated 4 months ago
- Official repo of ICASSP 2024 paper - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.☆51Updated 2 months ago
- [NAACL 2025] WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching☆50Updated this week
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆32Updated 5 months ago
- Official repository for the paper Singing Voice Graph Modeling for SingFake Detection (Interspeech 2024).☆25Updated 6 months ago
- ☆58Updated 4 months ago