open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,097Updated this week
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆12,101Updated last week
- Inference and training library for high-quality TTS models.☆5,261Updated 5 months ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,099Updated 5 months ago
- SOTA Open Source TTS☆21,227Updated this week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆7,992Updated 9 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,750Updated 9 months ago
- MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone☆19,506Updated this week
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆14,104Updated this week
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,544Updated last month
- YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open☆5,007Updated 2 weeks ago
- Multilingual Voice Understanding Model☆5,744Updated 2 months ago
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆32,401Updated last month
- Spark-TTS Inference Code☆9,549Updated last month
- Text-to-Audio/Music Generation☆2,425Updated 8 months ago
- Foundational model for human-like, expressive TTS☆4,129Updated 10 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,270Updated 2 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,257Updated last month
- Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with…☆3,689Updated this week
- Generative models for conditional audio generation☆3,268Updated last week
- Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>☆4,659Updated 2 months ago
- PhotoMaker [CVPR 2024]☆9,942Updated 7 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,319Updated last week
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,533Updated 6 months ago
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,293Updated 8 months ago
- 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)