open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,536Updated 6 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆7,038Updated 11 months ago
- Inference and training library for high-quality TTS models.☆5,495Updated last year
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,389Updated last year
- Foundational model for human-like, expressive TTS☆4,197Updated last year
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆5,025Updated last year
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆7,181Updated last year
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,437Updated 9 months ago
- Real time interactive streaming digital human☆6,862Updated 3 weeks ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,462Updated 7 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,082Updated last year
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆13,773Updated 2 weeks ago
- AI powered speech denoising and enhancement☆2,099Updated last year
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆13,818Updated this week
- SOTA Open Source TTS☆24,310Updated 2 weeks ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,732Updated 4 months ago
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆35,604Updated 7 months ago
- MARS5 speech model (TTS) from CAMB.AI☆2,806Updated last year
- Multilingual Voice Understanding Model☆7,149Updated 4 months ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,998Updated 11 months ago
- MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting☆5,050Updated 2 months ago
- Code and dataset for photorealistic Codec Avatars driven from audio☆2,845Updated last year
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,359Updated 10 months ago
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,369Updated last year
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆17,619Updated last month
- A fast multimodal LLM for real-time voice☆4,283Updated this week
- 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production☆43,837Updated last year
- Generative models for conditional audio generation☆3,534Updated 2 months ago
- MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising☆2,807Updated last year
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆9,333Updated this week
- Text-to-Audio/Music Generation☆2,536Updated last year