open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,580Updated 7 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆7,114Updated last year
- Inference and training library for high-quality TTS models.☆5,503Updated last year
- Foundational model for human-like, expressive TTS☆4,196Updated last year
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,402Updated last year
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,104Updated last year
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,452Updated 9 months ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,500Updated 8 months ago
- SOTA Open Source TTS☆24,452Updated last month
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆35,724Updated 8 months ago
- Generative models for conditional audio generation☆3,550Updated last week
- Code and dataset for photorealistic Codec Avatars driven from audio☆2,849Updated last year
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆13,882Updated last week
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,799Updated 4 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,547Updated 3 weeks ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆18,663Updated this week
- GLM-4-Voice | 端到端中英语音对话模型☆3,107Updated last year
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆7,200Updated last year
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆9,539Updated last week
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆5,022Updated last year
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,360Updated 11 months ago
- ☆4,582Updated 2 weeks ago
- AI powered speech denoising and enhancement☆2,131Updated last year
- Text-to-Audio/Music Generation☆2,551Updated last year
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,227Updated last month
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆9,708Updated 3 weeks ago
- A fast multimodal LLM for real-time voice☆4,306Updated 3 weeks ago
- A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频☆8,871Updated 4 months ago
- ☆1,502Updated last year
- Real time interactive streaming digital human☆6,950Updated this week
- Multilingual Voice Understanding Model☆7,270Updated last week