bytedance / MegaTTS3
☆4,942Updated 2 weeks ago
Alternatives and similar repositories for MegaTTS3:
Users that are interested in MegaTTS3 are comparing it to the libraries listed below
- Towards Human-Sounding Speech☆4,490Updated last week
- ☆4,213Updated last month
- An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System☆1,346Updated this week
- Spark-TTS Inference Code☆8,793Updated 2 weeks ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,289Updated this week
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆13,204Updated last week
- https://hf.co/hexgrad/Kokoro-82M☆2,432Updated 2 weeks ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,644Updated last week
- Taming Stable Diffusion for Lip Sync!☆3,725Updated last week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆11,448Updated last week
- Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"☆2,577Updated last month
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆1,475Updated this week
- YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open☆4,842Updated 2 weeks ago
- GLM-4-Voice | 端到端中英语音对话模型☆2,858Updated 4 months ago
- ☆2,859Updated last month
- InspireMusic: A Unified Framework for Music, Song, Audio Generation.☆1,060Updated last week
- Multilingual Voice Understanding Model☆5,393Updated last month
- SOTA Open Source TTS☆20,753Updated 2 weeks ago
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer☆4,024Updated this week
- Interface for OuteTTS models.☆1,178Updated last week
- Wan: Open and Advanced Large-Scale Video Generative Models☆10,569Updated this week
- Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching☆2,457Updated this week
- [CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation☆3,584Updated last month
- A fast multimodal LLM for real-time voice☆3,855Updated 2 months ago
- Inference and training library for high-quality TTS models.☆5,212Updated 4 months ago
- MAGI-1: Autoregressive Video Generation at Scale☆2,056Updated this week
- TTS with kokoro and onnx runtime☆1,901Updated 2 weeks ago
- Lets make video diffusion practical!☆8,944Updated this week
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆2,634Updated last week
- Open-Source Chrome extension for AI-powered web automation. Run multi-agent workflows using your own LLM API key. Alternative to OpenAI O…☆5,252Updated this week