ASLP-lab / DiffRhythm
Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
☆1,565Updated this week
Alternatives and similar repositories for DiffRhythm
Users that are interested in DiffRhythm are comparing it to the libraries listed below
Sorting:
- InspireMusic: A Unified Framework for Music, Song, Audio Generation.☆1,086Updated this week
- ACE-Step: A Step Towards Music Generation Foundation Model☆1,766Updated this week
- [CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis☆1,486Updated this week
- ☆786Updated last week
- Interface for OuteTTS models.☆1,214Updated 2 weeks ago
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆717Updated 2 months ago
- YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open☆4,945Updated this week
- NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms☆968Updated 3 weeks ago
- An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System☆1,614Updated 2 weeks ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,717Updated 2 weeks ago
- TTS with kokoro and onnx runtime☆1,960Updated this week
- YuE: Open Full-song Generation Foundation for the GPU Poor☆385Updated 3 months ago
- SkyReels V1: The first and most advanced open-source human-centric video foundation model☆2,142Updated 2 months ago
- Towards Human-Sounding Speech☆4,703Updated last week
- FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis☆1,073Updated this week
- ☆2,006Updated 2 weeks ago
- Taming Stable Diffusion for Lip Sync!☆3,968Updated last week
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆578Updated 9 months ago
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆1,024Updated 3 weeks ago
- HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo☆1,399Updated 3 weeks ago
- Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"☆2,654Updated last week
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,419Updated 3 weeks ago
- Sonic is a method about ' Shifting Focus to Global Audio Perception in Portrait Animation',you can use it in comfyUI☆923Updated 2 months ago
- HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation☆328Updated this week
- A Training-free Iterative Framework for Long Story Visualization☆888Updated 3 months ago
- Diffusion-based Portrait and Animal Animation☆770Updated 2 months ago
- LTX-Video Support for ComfyUI☆1,214Updated this week
- SkyReels-V2: Infinite-length Film Generative model☆2,183Updated this week
- Memory-Guided Diffusion for Expressive Talking Video Generation☆813Updated 3 months ago
- A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gem…☆1,191Updated this week