open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,451Updated 5 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,881Updated 10 months ago
- Inference and training library for high-quality TTS models.☆5,452Updated 10 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆16,966Updated last week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆13,484Updated this week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,351Updated last year
- SOTA Open Source TTS☆23,558Updated last week
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,007Updated last year
- Foundational model for human-like, expressive TTS☆4,191Updated last year
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,343Updated 6 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,513Updated 4 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,420Updated 7 months ago
- Generative models for conditional audio generation☆3,468Updated 3 weeks ago
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆9,249Updated 2 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,541Updated 2 months ago
- Multilingual Voice Understanding Model☆6,820Updated 2 months ago
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆5,008Updated last year
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆8,595Updated this week
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,358Updated 9 months ago
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆5,066Updated 3 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆13,191Updated 3 weeks ago
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆35,128Updated 6 months ago
- MARS5 speech model (TTS) from CAMB.AI☆2,800Updated last year
- Text-to-Audio/Music Generation☆2,508Updated last year
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,354Updated last year
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆4,218Updated 6 months ago
- GLM-4-Voice | 端到端中英语音对话模型☆3,067Updated 10 months ago
- A generative speech model for daily dialogue.☆38,022Updated 3 months ago
- ☆4,537Updated 4 months ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,938Updated last week
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,022Updated 2 weeks ago