open-mmlab / Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,031Updated 3 weeks ago
Alternatives and similar repositories for Amphion:
Users that are interested in Amphion are comparing it to the libraries listed below
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,012Updated 4 months ago
- SOTA Open Source TTS☆20,964Updated 3 weeks ago
- Inference and training library for high-quality TTS models.☆5,229Updated 4 months ago
- Multilingual Voice Understanding Model☆5,549Updated last month
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆13,610Updated this week
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆10,263Updated 2 weeks ago
- Foundational model for human-like, expressive TTS☆4,104Updated 9 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,162Updated this week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆7,951Updated 8 months ago
- A generative speech model for daily dialogue.☆36,104Updated this week
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,327Updated 3 months ago
- GLM-4-Voice | 端到端中英语音对话模型☆2,884Updated 5 months ago
- 官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project☆1,643Updated 10 months ago
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆4,940Updated 10 months ago
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆32,089Updated 2 weeks ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,387Updated 2 weeks ago
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆4,494Updated last month
- MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting☆4,059Updated 2 weeks ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,864Updated last year
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,699Updated 8 months ago
- ☆1,290Updated 10 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,698Updated last week
- Your image is almost there!☆7,589Updated 9 months ago
- AI powered speech denoising and enhancement☆1,770Updated 5 months ago
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet c…☆5,840Updated last week
- Text-to-Audio/Music Generation☆2,418Updated 7 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆4,004Updated 3 weeks ago
- Faster Whisper transcription with CTranslate2☆15,776Updated last week
- Spark-TTS Inference Code☆9,041Updated last month
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆8,120Updated this week