tencent-ailab / SongGenerationLinks
The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment
☆753Updated this week
Alternatives and similar repositories for SongGeneration
Users that are interested in SongGeneration are comparing it to the libraries listed below
Sorting:
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆628Updated last year
- A fundamental toolkit designed for music, song, and audio generation☆1,207Updated 4 months ago
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆786Updated 2 months ago
- Long-form streaming TTS system for multi-speaker dialogue generation☆722Updated 2 weeks ago
- ☆383Updated 2 weeks ago
- An Open-Sourced LLM-empowered Foundation TTS System☆842Updated 2 weeks ago
- ☆462Updated 4 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆262Updated 3 months ago
- ☆316Updated 5 months ago
- DICE-Talk is a diffusion-based emotional talking head generation method that can generate vivid and diverse emotions for speaking portrai…☆258Updated last month
- ☆453Updated 4 months ago
- [ICCV 2025] Official Pytorch Implementation of FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait.☆400Updated 3 months ago
- MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting…☆968Updated 2 weeks ago
- EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation☆539Updated 3 weeks ago
- ☆295Updated last year
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,129Updated last week
- Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching☆657Updated 2 weeks ago
- SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers☆561Updated 3 months ago
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆308Updated 3 months ago
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆616Updated 5 months ago
- HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning☆636Updated last week
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆687Updated 2 weeks ago
- Fork of ACE-Step for LoRA training with < 10 GB VRAM☆38Updated last month
- VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning☆1,542Updated this week
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆1,945Updated last month
- KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution☆361Updated last month
- [NeurIPS 2025] OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication☆381Updated 2 weeks ago
- Added vLLM support to IndexTTS for faster inference.☆653Updated this week
- VC Without Retrain!☆128Updated last year
- Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment☆169Updated 7 months ago