Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,728Mar 25, 2026Updated last week
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- SOTA Open Source TTS☆29,048Mar 30, 2026Updated last week
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆36,188Apr 19, 2025Updated 11 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆20,296Mar 16, 2026Updated 3 weeks ago
- Inference and training library for high-quality TTS models.☆5,560Dec 10, 2024Updated last year
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆14,277Mar 24, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production☆44,937Aug 16, 2024Updated last year
- [ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching☆1,273Mar 16, 2026Updated 3 weeks ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,661Apr 20, 2025Updated 11 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,475Mar 15, 2025Updated last year
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,235Aug 10, 2024Updated last year
- Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis☆1,098Aug 7, 2024Updated last year
- 🔊 Text-Prompted Generative Audio Model☆39,067Aug 19, 2024Updated last year
- PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html☆2,209Sep 10, 2025Updated 6 months ago
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆23,124Mar 3, 2026Updated last month
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.☆1,751Jan 26, 2026Updated 2 months ago
- Foundational model for human-like, expressive TTS☆4,201Jul 30, 2024Updated last year
- Text-to-Audio/Music Generation☆2,609Sep 29, 2024Updated last year
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,962Mar 4, 2026Updated last month
- Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch☆1,334Sep 24, 2023Updated 2 years ago
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,461Aug 13, 2024Updated last year
- Official PyTorch implementation of BigVGAN (ICLR 2023)☆1,200Sep 5, 2024Updated last year
- The official implementation of HierSpeech++☆1,241Feb 20, 2024Updated 2 years ago
- 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)☆56,367Feb 9, 2026Updated last month
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Generative models for conditional audio generation☆3,653Feb 14, 2026Updated last month
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,770Mar 3, 2026Updated last month
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆7,308Dec 24, 2024Updated last year
- A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone☆24,255Mar 7, 2026Updated 3 weeks ago
- AcademiCodec: An Open Source Audio Codec Model for Academic Research☆670Dec 27, 2023Updated 2 years ago
- GLM-4-Voice | 端到端中英语音对话模型☆3,165Dec 5, 2024Updated last year
- FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…☆443Jan 25, 2024Updated 2 years ago
- The Open Source Code of UniAudio☆605Jul 22, 2024Updated last year
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆4,011Aug 14, 2025Updated 7 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,955Feb 11, 2024Updated 2 years ago
- A generative speech model for daily dialogue.☆39,007Jan 18, 2026Updated 2 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆15,456Mar 17, 2026Updated 2 weeks ago
- An Open-Sourced LLM-empowered Foundation TTS System☆907Sep 28, 2025Updated 6 months ago
- AudioLDM: Generate speech, sound effects, music and beyond, with text.☆2,847Jun 25, 2025Updated 9 months ago
- Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3☆435Sep 13, 2024Updated last year
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆650Jun 9, 2024Updated last year