ASLP-lab / DiffRhythm
Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
☆1,268Updated this week
Alternatives and similar repositories for DiffRhythm:
Users that are interested in DiffRhythm are comparing it to the libraries listed below
- InspireMusic: A Unified Framework for Music, Song, Audio Generation.☆1,010Updated last week
- [CVPR 2025] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis☆1,237Updated 2 weeks ago
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆690Updated 3 weeks ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,052Updated last week
- Interface for OuteTTS models.☆957Updated last month
- NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms☆840Updated last week
- A Fast TTS Engine☆471Updated 2 months ago
- Taming Stable Diffusion for Lip Sync!☆3,317Updated last week
- ☆4,054Updated 2 weeks ago
- YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open☆4,547Updated last week
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆561Updated 8 months ago
- An Open-Sourced LLM-empowered Foundation TTS System☆647Updated 5 months ago
- ☆289Updated 2 weeks ago
- Memory-Guided Diffusion for Expressive Talking Video Generation☆765Updated 2 months ago
- https://hf.co/hexgrad/Kokoro-82M☆1,911Updated this week
- SkyReels V1: The first and most advanced open-source human-centric video foundation model☆1,884Updated 2 weeks ago
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆926Updated last month
- Diffusion-based Portrait and Animal Animation☆719Updated 3 weeks ago
- YuE: Open Full-song Generation Foundation for the GPU Poor☆346Updated last month
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,444Updated last month
- first base model for full-duplex conversational audio☆1,725Updated 2 months ago
- Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"☆2,225Updated 2 weeks ago
- ☆2,719Updated last week
- HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo☆1,174Updated last week
- TTS with kokoro and onnx runtime☆1,809Updated 3 weeks ago
- OpenMusic: SOTA Text-to-music (TTM) Generation☆543Updated last month
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆483Updated 2 weeks ago
- An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System☆277Updated last month
- CogView4, CogView3-Plus and CogView3(ECCV 2024)☆951Updated last week
- A Training-free Iterative Framework for Long Story Visualization☆858Updated 2 months ago