Muse: Towards Reproducible Long-Form Song Generation with Fine-Grained Style Control
☆98Feb 18, 2026Updated 2 weeks ago
Alternatives and similar repositories for Muse
Users that are interested in Muse are comparing it to the libraries listed below
Sorting:
- ☆30Sep 15, 2025Updated 5 months ago
- ☆99Jan 19, 2026Updated last month
- Encode and decode audio samples to/from continuous and discrete compressed representations!☆104Nov 25, 2025Updated 3 months ago
- SyMuRBench: Benchmark for symbolic music representations☆17Nov 6, 2025Updated 3 months ago
- [AutoArk] GPA (General Purpose Audio) can do ASR, TTS and voice conversion with one tiny 300M model!☆87Jan 29, 2026Updated last month
- [ICASSP 2026] Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis☆36Dec 24, 2025Updated 2 months ago
- Trainging, inference, and testing of the SAC speech codec model.☆99Nov 1, 2025Updated 4 months ago
- Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training☆64Feb 7, 2026Updated 3 weeks ago
- ☆156Nov 22, 2024Updated last year
- LEMAS‑TTS is a multilingual zero‑shot text‑to‑speech system, supporting 10 languages: Chinese English Spanish Russian French German Ital…☆91Jan 14, 2026Updated last month
- A Neural Audio Codec (NAC) for Universal Audio☆44May 30, 2025Updated 9 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆76Jan 25, 2026Updated last month
- Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control☆160Feb 26, 2026Updated last week
- Timbre Transfer using Denoising Diffusion Implicit Models (ISMIR 2023)☆28Mar 22, 2025Updated 11 months ago
- Improving Symbolic Music Generation with Inference-Time Alignment☆20Aug 2, 2025Updated 7 months ago
- This is the official implementation of MusicMamba.☆10Sep 18, 2024Updated last year
- BachDuet enables a human performer to improvise a duet counterpoint with a computer agent in real time.☆14Aug 8, 2022Updated 3 years ago
- ☆83Dec 31, 2025Updated 2 months ago
- ☆68Dec 30, 2025Updated 2 months ago
- Audio Embeddings as Teachers for Music Classification☆13Sep 7, 2023Updated 2 years ago
- (Experimental) Predicting hand assignments in piano MIDI using neural networks☆13Oct 11, 2024Updated last year
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆304Nov 5, 2025Updated 4 months ago
- An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.☆222Updated this week
- Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding" (IC…☆65Jan 27, 2026Updated last month
- Aligntune : A Modular Toolkit for Post Training Alignment of LLMs☆35Feb 26, 2026Updated last week
- ☆13May 16, 2021Updated 4 years ago
- Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls☆87Jul 16, 2024Updated last year
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆108Dec 20, 2025Updated 2 months ago
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆119Oct 17, 2025Updated 4 months ago
- ☆32Dec 29, 2020Updated 5 years ago
- End-to-end real-world polyphonic piano audio-to-score transcription with hierarchical decoding (IJCAI 2024)☆41Sep 17, 2024Updated last year
- ☆36Sep 6, 2025Updated 5 months ago
- FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens…☆38Feb 17, 2026Updated 2 weeks ago
- Front-end for symbolic music AI models☆17Nov 20, 2025Updated 3 months ago
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 8 months ago
- SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"☆37Aug 29, 2023Updated 2 years ago
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open …☆22Feb 7, 2026Updated 3 weeks ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 6 months ago
- Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations☆63Jan 16, 2025Updated last year