open-mmlab/Amphion

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/open-mmlab/Amphion)

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

☆9,931

Alternatives and similar repositories for Amphion

Users that are interested in Amphion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fishaudio / fish-speech
View on GitHub
SOTA Open Source TTS
☆31,273Jun 9, 2026Updated last month
myshell-ai / OpenVoice
View on GitHub
Instant voice cloning by MIT and MyShell. Audio foundation model.
☆36,943Apr 19, 2025Updated last year
FunAudioLLM / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,188May 25, 2026Updated last month
huggingface / parler-tts
View on GitHub
Inference and training library for high-quality TTS models.
☆5,582Dec 10, 2024Updated last year
SWivid / F5-TTS
View on GitHub
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
☆14,952Jul 5, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
coqui-ai / TTS
View on GitHub
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
☆45,756Aug 16, 2024Updated last year
shivammehta25 / Matcha-TTS
View on GitHub
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
☆1,331Updated this week
jasonppy / VoiceCraft
View on GitHub
Zero-Shot Speech Editing and Text-to-Speech in the Wild
☆8,495May 30, 2026Updated last month
Plachtaa / seed-vc
View on GitHub
zero-shot voice conversion & singing voice conversion, with real-time support
☆3,870Apr 20, 2025Updated last year
yl4579 / StyleTTS2
View on GitHub
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
☆6,312Aug 10, 2024Updated last year
lifeiteng / vall-e
View on GitHub
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
☆2,205Sep 10, 2025Updated 10 months ago
suno-ai / bark
View on GitHub
🔊 Text-Prompted Generative Audio Model
☆39,197Aug 19, 2024Updated last year
gemelo-ai / vocos
View on GitHub
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
☆1,144Aug 7, 2024Updated last year
facebookresearch / audiocraft
View on GitHub
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…
☆23,476Mar 3, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
descriptinc / descript-audio-codec
View on GitHub
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
☆1,837Updated this week
metavoiceio / metavoice-src
View on GitHub
Foundational model for human-like, expressive TTS
☆4,201Jul 30, 2024Updated last year
haoheliu / AudioLDM2
View on GitHub
Text-to-Audio/Music Generation
☆2,634Sep 29, 2024Updated last year
kyutai-labs / moshi
View on GitHub
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…
☆10,591May 16, 2026Updated 2 months ago
lucidrains / naturalspeech2-pytorch
View on GitHub
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
☆1,333Sep 24, 2023Updated 2 years ago
netease-youdao / EmotiVoice
View on GitHub
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
☆8,487Aug 13, 2024Updated last year
NVIDIA / BigVGAN
View on GitHub
Official PyTorch implementation of BigVGAN (ICLR 2023)
☆1,229Sep 5, 2024Updated last year
sh-lee-prml / HierSpeechpp
View on GitHub
The official implementation of HierSpeech++
☆1,240Feb 20, 2024Updated 2 years ago
RVC-Boss / GPT-SoVITS
View on GitHub
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
☆59,806Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Stability-AI / stable-audio-tools
View on GitHub
Generative models for conditional audio generation
☆3,816Updated this week
facebookresearch / seamless_communication
View on GitHub
Foundational Models for State-of-the-Art Speech and Text Translation
☆11,817Apr 8, 2026Updated 3 months ago
zai-org / GLM-4-Voice
View on GitHub
GLM-4-Voice | 端到端中英语音对话模型
☆3,204Dec 5, 2024Updated last year
myshell-ai / MeloTTS
View on GitHub
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
☆7,539Dec 24, 2024Updated last year
yangdongchao / AcademiCodec
View on GitHub
AcademiCodec: An Open Source Audio Codec Model for Academic Research
☆674Dec 27, 2023Updated 2 years ago
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆25,891Jun 25, 2026Updated 3 weeks ago
modelscope / ClearerVoice-Studio
View on GitHub
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…
☆4,304Aug 14, 2025Updated 11 months ago
yangdongchao / UniAudio
View on GitHub
The Open Source Code of UniAudio
☆606Jul 22, 2024Updated last year
Plachtaa / VALL-E-X
View on GitHub
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
☆7,937Feb 11, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
modelscope / FunCodec
View on GitHub
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…
☆445Jan 25, 2024Updated 2 years ago
2noise / ChatTTS
View on GitHub
A generative speech model for daily dialogue.
☆39,619Apr 10, 2026Updated 3 months ago
FireRedTeam / FireRedTTS
View on GitHub
An Open-Sourced LLM-empowered Foundation TTS System
☆908Sep 28, 2025Updated 9 months ago
haoheliu / AudioLDM
View on GitHub
AudioLDM: Generate speech, sound effects, music and beyond, with text.
☆2,903Jun 25, 2025Updated last year
modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,256Updated this week
KdaiP / StableTTS
View on GitHub
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
☆437Sep 13, 2024Updated last year
ZhangXInFD / SpeechTokenizer
View on GitHub
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…
☆661Jun 9, 2024Updated 2 years ago