open-mmlab / Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆8,969Updated last week
Alternatives and similar repositories for Amphion:
Users that are interested in Amphion are comparing it to the libraries listed below
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆11,448Updated this week
- SOTA Open Source TTS☆20,753Updated last week
- Inference and training library for high-quality TTS models.☆5,199Updated 4 months ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆5,946Updated 4 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆9,922Updated this week
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,251Updated last month
- A powerful framework for building realtime voice AI agents 🤖🎙️📹☆5,679Updated this week
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆13,204Updated this week
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,656Updated 8 months ago
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆31,877Updated this week
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,485Updated 5 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,106Updated this week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆7,918Updated 8 months ago
- Multilingual Voice Understanding Model☆5,393Updated last month
- 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production☆39,485Updated 8 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,206Updated 2 weeks ago
- A fast multimodal LLM for real-time voice☆3,855Updated 2 months ago
- Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with…☆3,610Updated last week
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆8,057Updated this week
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet c…☆5,670Updated this week
- Foundational model for human-like, expressive TTS☆4,098Updated 8 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,644Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆17,200Updated this week
- AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording☆13,542Updated last week
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆4,933Updated 9 months ago
- ☆4,213Updated last month
- AI powered speech denoising and enhancement☆1,756Updated 4 months ago
- Faster Whisper transcription with CTranslate2☆15,613Updated last month
- Text-to-Audio/Music Generation☆2,407Updated 6 months ago
- GLM-4-Voice | 端到端中英语音对话模型☆2,858Updated 4 months ago