open-mmlab / Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆8,831Updated 3 weeks ago
Alternatives and similar repositories for Amphion:
Users that are interested in Amphion are comparing it to the libraries listed below
- Inference and training library for high-quality TTS models.☆5,161Updated 3 months ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆5,801Updated 3 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,203Updated last week
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,560Updated 7 months ago
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,242Updated 6 months ago
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆6,940Updated 7 months ago
- SOTA Open Source TTS☆20,165Updated this week
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆31,425Updated 2 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,166Updated 3 months ago
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆7,751Updated 7 months ago
- Foundational model for human-like, expressive TTS☆4,070Updated 7 months ago
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆4,893Updated 8 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆7,875Updated this week
- Generative models for conditional audio generation☆2,968Updated 3 weeks ago
- A fast multimodal LLM for real-time voice☆3,757Updated last month
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆7,755Updated 3 weeks ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆10,663Updated this week
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆9,049Updated this week
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,434Updated 4 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆12,208Updated this week
- StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation☆10,069Updated 3 months ago
- Text-to-Audio/Music Generation☆2,390Updated 5 months ago
- PhotoMaker [CVPR 2024]☆9,846Updated 4 months ago
- Real time interactive streaming digital human☆4,992Updated last week
- Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with…☆3,493Updated this week
- Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切…☆12,031Updated this week
- MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting☆3,747Updated 3 months ago
- Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>☆4,580Updated 2 weeks ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,444Updated last month
- ☆4,054Updated 2 weeks ago