open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,467Updated 5 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- Inference and training library for high-quality TTS models.☆5,452Updated 10 months ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,900Updated 10 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,424Updated 7 months ago
- Foundational model for human-like, expressive TTS☆4,194Updated last year
- SOTA Open Source TTS☆23,899Updated last week
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆17,048Updated 2 weeks ago
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆5,008Updated last year
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,019Updated last year
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,372Updated 6 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆13,515Updated this week
- An Open Source text-to-speech system built by inverting Whisper.☆4,513Updated 4 months ago
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆7,148Updated last year
- Multilingual Voice Understanding Model☆6,852Updated 2 months ago
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆8,595Updated last week
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,584Updated 2 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,043Updated this week
- AI powered speech denoising and enhancement☆2,028Updated 11 months ago
- ☆6,009Updated 2 months ago
- MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting☆4,888Updated last month
- Real time interactive streaming digital human☆6,666Updated last month
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆9,297Updated 2 months ago
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,361Updated last year
- ☆1,452Updated last year
- MARS5 speech model (TTS) from CAMB.AI☆2,802Updated last year
- 官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project☆1,804Updated last year
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆5,095Updated 3 months ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,956Updated last year
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,354Updated last year
- GLM-4-Voice | 端到端中英语音对话模型☆3,070Updated 11 months ago
- Text-to-Audio/Music Generation☆2,510Updated last year