open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,174Updated 3 weeks ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,180Updated 5 months ago
- Inference and training library for high-quality TTS models.☆5,303Updated 6 months ago
- SOTA Open Source TTS☆21,914Updated last week
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆14,672Updated last week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,045Updated 10 months ago
- Foundational model for human-like, expressive TTS☆4,132Updated 10 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,460Updated this week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆12,322Updated last week
- An Open Source text-to-speech system built by inverting Whisper.☆4,286Updated last week
- Multilingual Voice Understanding Model☆5,951Updated 2 months ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,635Updated 2 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆11,011Updated 3 weeks ago
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆32,668Updated 2 months ago
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆6,412Updated this week
- A generative speech model for daily dialogue.☆36,799Updated 3 weeks ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,883Updated 5 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,799Updated 10 months ago
- Bring portraits to life!☆16,277Updated last week
- Generative models for conditional audio generation☆3,335Updated 2 weeks ago
- ML-powered speech recognition directly in your browser☆2,963Updated 8 months ago
- Converts text to speech in realtime☆3,175Updated last week
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆7,078Updated 10 months ago
- Faster Whisper transcription with CTranslate2☆16,585Updated 2 weeks ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,292Updated 3 months ago
- Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with…☆3,717Updated 3 weeks ago
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆22,147Updated 3 months ago
- ☆5,549Updated last month
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,940Updated last week
- Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切…☆13,233Updated last month
- ☆1,332Updated last year