fishaudio/fish-speech

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/fishaudio/fish-speech)

fishaudio / fish-speech

SOTA Open Source TTS

☆31,382

Alternatives and similar repositories for fish-speech

Users that are interested in fish-speech are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

QwenAudio / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,418May 25, 2026Updated 2 months ago
RVC-Boss / GPT-SoVITS
View on GitHub
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
☆60,135Updated this week
2noise / ChatTTS
View on GitHub
A generative speech model for daily dialogue.
☆39,683Apr 10, 2026Updated 3 months ago
SWivid / F5-TTS
View on GitHub
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
☆15,017Updated this week
myshell-ai / OpenVoice
View on GitHub
Instant voice cloning by MIT and MyShell. Audio foundation model.
☆37,026Apr 19, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
coqui-ai / TTS
View on GitHub
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
☆45,820Aug 16, 2024Updated last year
modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,483Updated this week
open-mmlab / Amphion
View on GitHub
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…
☆9,967Mar 25, 2026Updated 4 months ago
Comfy-Org / ComfyUI
View on GitHub
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
☆122,349Updated this week
index-tts / index-tts
View on GitHub
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
☆22,170Jul 14, 2026Updated last week
microsoft / VibeVoice
View on GitHub
Open-Source Frontier Voice AI
☆50,531Updated this week
lobehub / lobehub
View on GitHub
🤯 LobeHub is your Chief Agent Operator, organizing your agents into 7×24 operations by hiring, scheduling, and reporting on your entire …
☆80,830Updated this week
fishaudio / Bert-VITS2
View on GitHub
vits2 backbone with multilingual-bert
☆8,781Updated this week
QwenAudio / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,940Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆26,007Updated this week
resemble-ai / chatterbox
View on GitHub
SoTA open-source TTS
☆25,710Updated this week
suno-ai / bark
View on GitHub
🔊 Text-Prompted Generative Audio Model
☆39,216Aug 19, 2024Updated last year
huggingface / parler-tts
View on GitHub
Inference and training library for high-quality TTS models.
☆5,581Dec 10, 2024Updated last year
langgenius / dify
View on GitHub
Build Agentic workflows, RAG pipelines, with rich AI model and tool support on one collaborative workspace. Deploy on cloud, VPC, or self…
☆150,308Updated this week
myshell-ai / MeloTTS
View on GitHub
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
☆7,551Dec 24, 2024Updated last year
unslothai / unsloth
View on GitHub
Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.
☆68,918Updated this week
SYSTRAN / faster-whisper
View on GitHub
Faster Whisper transcription with CTranslate2
☆24,545Nov 19, 2025Updated 8 months ago
mem0ai / mem0
View on GitHub
Universal memory layer for AI Agents
☆61,740Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hacksider / Deep-Live-Cam
View on GitHub
real time face swap and one-click video deepfake with only a single image
☆95,259Updated this week
openai / whisper
View on GitHub
Robust Speech Recognition via Large-Scale Weak Supervision
☆105,641Apr 15, 2026Updated 3 months ago
unclecode / crawl4ai
View on GitHub
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
☆75,077Updated this week
opendatalab / MinerU
View on GitHub
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
☆75,768Updated this week
OpenBMB / VoxCPM
View on GitHub
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
☆34,236Jul 8, 2026Updated 2 weeks ago
k2-fsa / sherpa-onnx
View on GitHub
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…
☆13,795Updated this week
agno-agi / agno
View on GitHub
Build, run, and manage agent platforms.
☆41,430Updated this week
netease-youdao / EmotiVoice
View on GitHub
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
☆8,499Aug 13, 2024Updated last year
browser-use / browser-use
View on GitHub
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
☆106,866Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
kyutai-labs / moshi
View on GitHub
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…
☆10,726May 16, 2026Updated 2 months ago
stanford-oval / storm
View on GitHub
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
☆30,329Sep 30, 2025Updated 9 months ago
facefusion / facefusion
View on GitHub
Industry leading face manipulation platform
☆29,400Updated this week
ollama / ollama
View on GitHub
Get up and running with Kimi-K2.6, GLM-5.2, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
☆176,923Updated this week
Mintplex-Labs / anything-llm
View on GitHub
Stop renting your intelligence. Own it with AnythingLLM. Everything you need for a powerful local-first agent experience
☆63,878Updated this week
SparkAudio / Spark-TTS
View on GitHub
Spark-TTS Inference Code
☆10,999Apr 9, 2025Updated last year
infiniflow / ragflow
View on GitHub
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to creat…
☆86,041Updated this week