multimodal-art-projection / YuE
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
☆4,547Updated last week
Alternatives and similar repositories for YuE:
Users that are interested in YuE are comparing it to the libraries listed below
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆1,268Updated this week
- TTS Towards Human-Sounding Speech☆2,717Updated this week
- https://hf.co/hexgrad/Kokoro-82M☆1,911Updated this week
- Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expres…☆6,196Updated 3 weeks ago
- NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms☆840Updated this week
- [CVPR 2025] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis☆1,237Updated last week
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,052Updated last week
- InspireMusic: A Unified Framework for Music, Song, Audio Generation.☆1,010Updated last week
- Video Generation Foundation Models: https://saiyan-world.github.io/goku/☆2,746Updated last month
- Generative models for conditional audio generation☆2,974Updated this week
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆926Updated last month
- Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching☆2,112Updated this week
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆3,796Updated this week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆10,663Updated this week
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,444Updated last month
- The best OSS video generation models☆3,044Updated 2 months ago
- Official repository for LTX-Video☆3,189Updated 3 weeks ago
- ☆4,054Updated 2 weeks ago
- Various AI scripts. Mostly Stable Diffusion stuff.☆4,376Updated this week
- Text-to-Music Generation with Rectified Flow Transformers☆1,679Updated 3 months ago
- Local realtime voice AI☆2,264Updated 3 weeks ago
- Taming Stable Diffusion for Lip Sync!☆3,317Updated this week
- Spark-TTS Inference Code☆6,062Updated last week
- The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention☆2,415Updated last week
- YuE: Open Full-song Generation Foundation for the GPU Poor☆346Updated last month
- Wan: Open and Advanced Large-Scale Video Generative Models☆9,018Updated this week
- ☆2,719Updated last week
- SkyReels V1: The first and most advanced open-source human-centric video foundation model☆1,884Updated 2 weeks ago
- Interface for OuteTTS models.☆957Updated last month
- ☆3,340Updated last month