open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,247Updated last month
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,265Updated 6 months ago
- Inference and training library for high-quality TTS models.☆5,349Updated 7 months ago
- SOTA Open Source TTS☆22,407Updated 2 weeks ago
- Generative models for conditional audio generation☆3,363Updated this week
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆33,017Updated 3 months ago
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,095Updated 11 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,319Updated 4 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,657Updated last week
- Foundational model for human-like, expressive TTS☆4,136Updated 11 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,067Updated last week
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,735Updated 3 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆15,268Updated this week
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆4,978Updated last year
- An Open Source text-to-speech system built by inverting Whisper.☆4,312Updated last month
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆12,659Updated this week
- Text-to-Audio/Music Generation☆2,470Updated 9 months ago
- Real time interactive streaming digital human☆5,986Updated 2 weeks ago
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,346Updated 5 months ago
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆6,737Updated this week
- GLM-4-Voice | 端到端中英语音对话模型☆2,979Updated 7 months ago
- Bring portraits to life!☆16,625Updated last month
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,846Updated 11 months ago
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆8,652Updated 2 months ago
- Enjoy the magic of Diffusion models!☆9,063Updated this week
- Multilingual Voice Understanding Model☆6,171Updated 2 weeks ago
- AI powered speech denoising and enhancement☆1,880Updated 7 months ago
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,308Updated 9 months ago
- A fast multimodal LLM for real-time voice☆4,099Updated 2 weeks ago
- MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising☆2,754Updated last year
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,917Updated 6 months ago