multimodal-art-projection / YuELinks
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
☆5,101Updated 2 weeks ago
Alternatives and similar repositories for YuE
Users that are interested in YuE are comparing it to the libraries listed below
Sorting:
- ACE-Step: A Step Towards Music Generation Foundation Model☆2,459Updated 2 weeks ago
- Towards Human-Sounding Speech☆5,039Updated last month
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆1,720Updated 3 weeks ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,264Updated 2 weeks ago
- SoTA open-source TTS☆8,100Updated last week
- NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms☆1,033Updated 2 months ago
- InspireMusic: A toolkit designed for music, song, and audio generation☆1,122Updated last month
- Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expres…☆6,753Updated 3 months ago
- [CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis☆1,608Updated last month
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆16,976Updated 3 weeks ago
- ☆5,549Updated last month
- MAGI-1: Autoregressive Video Generation at Scale☆3,284Updated this week
- Generative models for conditional audio generation☆3,335Updated 2 weeks ago
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆1,113Updated 2 months ago
- SkyReels-V2: Infinite-length Film Generative model☆3,090Updated 3 weeks ago
- ☆967Updated last month
- Interface for OuteTTS models.☆1,304Updated 3 weeks ago
- Text-to-Music Generation with Rectified Flow Transformers☆1,701Updated 6 months ago
- Official implementations for paper: VACE: All-in-One Video Creation and Editing☆2,648Updated last month
- Wan: Open and Advanced Large-Scale Video Generative Models☆12,263Updated last week
- ☆3,025Updated 3 months ago
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆3,818Updated 2 weeks ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,460Updated this week
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,940Updated last week
- HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo☆1,507Updated last month
- Official repository for LTX-Video☆6,696Updated 3 weeks ago
- LTX-Video Support for ComfyUI☆2,068Updated last month
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆2,399Updated last week
- A Conversational Speech Generation Model☆13,544Updated 3 weeks ago
- SkyReels V1: The first and most advanced open-source human-centric video foundation model☆2,207Updated 3 months ago