open-mmlab / Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆7,527Updated last week
Related projects ⓘ
Alternatives and complementary repositories for Amphion
- Inference and training library for high-quality TTS models.☆4,604Updated last week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆6,873Updated this week
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆4,776Updated 3 months ago
- Brand new TTS solution☆14,323Updated this week
- Foundational model for human-like, expressive TTS☆3,878Updated 3 months ago
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆7,418Updated 3 months ago
- [SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation☆5,323Updated 2 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆4,934Updated 3 months ago
- ☆6,708Updated last week
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆4,630Updated 4 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆6,212Updated this week
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆7,631Updated 4 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆3,965Updated 4 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆6,899Updated this week
- Foundational Models for State-of-the-Art Speech and Text Translation☆10,922Updated 2 months ago
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆5,934Updated last month
- Multilingual Voice Understanding Model☆3,383Updated 3 weeks ago
- Real time interactive streaming digital human☆3,854Updated this week
- Your image is almost there!☆7,319Updated 3 months ago
- Generative models for conditional audio generation☆2,709Updated last week
- Faster Whisper transcription with CTranslate2☆12,355Updated last week
- Code and dataset for photorealistic Codec Avatars driven from audio☆2,707Updated last month
- Instant voice cloning by MIT and MyShell.☆29,741Updated 2 months ago
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆6,621Updated 3 months ago
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆3,635Updated 2 months ago
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,247Updated 3 weeks ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,611Updated 2 weeks ago
- Open Source framework for voice and multimodal conversational AI☆3,354Updated this week
- 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production☆35,340Updated 2 months ago
- MARS5 speech model (TTS) from CAMB.AI☆2,530Updated 3 months ago