open-mmlab / Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆8,885Updated 3 weeks ago
Alternatives and similar repositories for Amphion:
Users that are interested in Amphion are comparing it to the libraries listed below
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆5,858Updated 3 months ago
- Inference and training library for high-quality TTS models.☆5,168Updated 3 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,214Updated 2 weeks ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆10,821Updated this week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆7,789Updated 7 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆12,574Updated last week
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆4,355Updated 3 weeks ago
- Multilingual Voice Understanding Model☆5,180Updated last week
- SOTA Open Source TTS☆20,369Updated last week
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆31,548Updated 2 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,589Updated 7 months ago
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆7,795Updated last month
- Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with…☆3,551Updated this week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆7,956Updated this week
- Code and dataset for photorealistic Codec Avatars driven from audio☆2,784Updated 6 months ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,836Updated last year
- Generative models for conditional audio generation☆2,978Updated last week
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,319Updated 2 months ago
- Real time interactive streaming digital human☆5,114Updated this week
- Text-to-Audio/Music Generation☆2,397Updated 6 months ago
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆6,954Updated 7 months ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,095Updated last week
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,251Updated 4 months ago
- [ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion …☆1,551Updated 7 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,170Updated 3 months ago
- tiny vision language model☆7,701Updated this week
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆4,903Updated 9 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,476Updated this week
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆9,268Updated this week
- GLM-4-Voice | 端到端中英语音对话模型☆2,801Updated 3 months ago