open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,277Updated 2 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,610Updated 7 months ago
- Inference and training library for high-quality TTS models.☆5,380Updated 8 months ago
- SOTA Open Source TTS☆22,643Updated 2 weeks ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,170Updated 2 weeks ago
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆6,992Updated this week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,125Updated 11 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,329Updated 2 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,899Updated last year
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆12,891Updated 2 weeks ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,824Updated 3 months ago
- Generative models for conditional audio generation☆3,396Updated 3 weeks ago
- Foundational model for human-like, expressive TTS☆4,144Updated last year
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,720Updated last week
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,360Updated 4 months ago
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆7,116Updated last year
- Text-to-Audio/Music Generation☆2,481Updated 10 months ago
- Multilingual Voice Understanding Model☆6,329Updated last month
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆33,864Updated 3 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆15,585Updated last week
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆4,829Updated last month
- GLM-4-Voice | 端到端中英语音对话模型☆3,003Updated 8 months ago
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆4,996Updated last year
- AI powered speech denoising and enhancement☆1,911Updated 8 months ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,929Updated 7 months ago
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,349Updated 6 months ago
- Bring portraits to life!☆16,750Updated last month
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆11,855Updated this week
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,627Updated 8 months ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,908Updated last year
- Enjoy the magic of Diffusion models!☆9,300Updated this week