Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,706May 27, 2025Updated 9 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- SOTA Open Source TTS☆25,078Feb 2, 2026Updated last month
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆36,025Apr 19, 2025Updated 10 months ago
- Inference and training library for high-quality TTS models.☆5,541Dec 10, 2024Updated last year
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆19,786Feb 11, 2026Updated 3 weeks ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,463Mar 15, 2025Updated 11 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆14,122Updated this week
- 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production☆44,691Aug 16, 2024Updated last year
- 🔊 Text-Prompted Generative Audio Model☆39,006Aug 19, 2024Updated last year
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆23,029Mar 13, 2025Updated 11 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,187Aug 10, 2024Updated last year
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,601Apr 20, 2025Updated 10 months ago
- Foundational model for human-like, expressive TTS☆4,199Jul 30, 2024Updated last year
- A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone☆23,942Feb 23, 2026Updated last week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,450Aug 13, 2024Updated last year
- 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)☆55,429Feb 9, 2026Updated 3 weeks ago
- [ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching☆1,250Feb 23, 2026Updated last week
- Text-to-Audio/Music Generation☆2,587Sep 29, 2024Updated last year
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,750Feb 12, 2026Updated 2 weeks ago
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,762Nov 14, 2024Updated last year
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆7,232Dec 24, 2024Updated last year
- Generative models for conditional audio generation☆3,611Feb 14, 2026Updated 2 weeks ago
- Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis☆1,069Aug 7, 2024Updated last year
- PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html☆2,203Sep 10, 2025Updated 5 months ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,957Feb 11, 2024Updated 2 years ago
- An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.☆27,949Sep 30, 2025Updated 5 months ago
- A generative speech model for daily dialogue.☆38,850Jan 18, 2026Updated last month
- The official implementation of HierSpeech++☆1,243Feb 20, 2024Updated 2 years ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,936Aug 14, 2025Updated 6 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆15,036Updated this week
- State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.☆1,718Jan 26, 2026Updated last month
- Open-Sora: Democratizing Efficient Video Production for All☆28,632Apr 30, 2025Updated 10 months ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,360Feb 24, 2026Updated last week
- A fast multimodal LLM for real-time voice☆4,367Dec 12, 2025Updated 2 months ago
- GLM-4-Voice | 端到端中英语音对话模型☆3,141Dec 5, 2024Updated last year
- An Open Source text-to-speech system built by inverting Whisper.☆4,567Dec 14, 2025Updated 2 months ago
- AudioLDM: Generate speech, sound effects, music and beyond, with text.☆2,831Jun 25, 2025Updated 8 months ago
- Official PyTorch implementation of BigVGAN (ICLR 2023)☆1,188Sep 5, 2024Updated last year
- Automate browser based workflows with AI☆20,629Updated this week
- An open-source RAG-based tool for chatting with your documents.☆25,168Updated this week