open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,380Updated 3 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,760Updated 8 months ago
- Inference and training library for high-quality TTS models.☆5,411Updated 9 months ago
- SOTA Open Source TTS☆22,905Updated last week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,318Updated last year
- Foundational model for human-like, expressive TTS☆4,159Updated last year
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆13,236Updated this week
- A fast multimodal LLM for real-time voice☆4,192Updated 2 weeks ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆16,369Updated this week
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,377Updated last month
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,957Updated last year
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆5,003Updated last year
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,355Updated 7 months ago
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆7,128Updated last year
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,998Updated 4 months ago
- Code and dataset for photorealistic Codec Avatars driven from audio☆2,842Updated last year
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,343Updated 11 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,383Updated 6 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,897Updated last week
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,917Updated last year
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆34,372Updated 4 months ago
- Generative models for conditional audio generation☆3,431Updated 2 months ago
- Text-to-Audio/Music Generation☆2,492Updated 11 months ago
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆7,394Updated this week
- Multilingual Voice Understanding Model☆6,598Updated last month
- ☆4,516Updated 3 months ago
- VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech☆7,679Updated last year
- Enjoy the magic of Diffusion models!☆9,980Updated last week
- Converts text to speech in realtime☆3,503Updated last month
- MARS5 speech model (TTS) from CAMB.AI☆2,797Updated last year
- Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>☆4,764Updated 6 months ago