maskgct / maskgct.github.ioLinks
MaskGCT demo page
☆14Updated last year
Alternatives and similar repositories for maskgct.github.io
Users that are interested in maskgct.github.io are comparing it to the libraries listed below
Sorting:
- ☆204Updated last year
- ☆70Updated 2 years ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆434Updated last year
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆299Updated 3 months ago
- Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation☆429Updated 2 months ago
- GPT-4o-level, real-time spoken dialogue system.☆369Updated last year
- ☆484Updated 9 months ago
- Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment☆172Updated last year
- ☆474Updated 8 months ago
- VC Without Retrain!☆129Updated last year
- ☆298Updated last year
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆219Updated last year
- ☆343Updated 9 months ago
- IndexTTS Fine-tuning notebooks☆132Updated 7 months ago
- [IJCV] FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆643Updated last year
- This is the official repository for M2UGen☆511Updated last year
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆188Updated last year
- An Open-Sourced LLM-empowered Foundation TTS System☆895Updated 4 months ago
- GPT-SoVITS2☆229Updated last year
- ☆167Updated last year
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆329Updated last month
- F5-TTS 推理加速,速度提升约4倍!☆122Updated last year
- text to speech using autoregressive transformer and VITS☆249Updated last year
- ☆379Updated last year
- JoyHallo: Digital human model for Mandarin☆522Updated 4 months ago
- RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios☆79Updated 7 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆183Updated 7 months ago
- Text-audio foundation model from Boson AI☆117Updated 5 months ago
- Unoffical implementation of Megatts2☆288Updated last year
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆475Updated 2 months ago