k2-fsa/sherpa-onnx

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/k2-fsa/sherpa-onnx)

k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, RK NPU, Axera NPU, Ascend NPU, x86_64 servers, websocket server/client, support 12 programming languages

☆13,851

Alternatives and similar repositories for sherpa-onnx

Users that are interested in sherpa-onnx are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,542Updated this week
QwenAudio / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,954Updated this week
k2-fsa / sherpa-ncnn
View on GitHub
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, …
☆1,764Oct 20, 2025Updated 9 months ago
snakers4 / silero-vad
View on GitHub
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
☆9,791Jul 16, 2026Updated last week
k2-fsa / sherpa
View on GitHub
Speech-to-text server framework with next-gen Kaldi
☆964Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
fishaudio / fish-speech
View on GitHub
SOTA Open Source TTS
☆31,559Updated this week
QwenAudio / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,464May 25, 2026Updated 2 months ago
ggml-org / whisper.cpp
View on GitHub
Port of OpenAI's Whisper model in C/C++
☆52,406Updated this week
SYSTRAN / faster-whisper
View on GitHub
Faster Whisper transcription with CTranslate2
☆24,609Nov 19, 2025Updated 8 months ago
TEN-framework / ten-vad
View on GitHub
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
☆2,215Feb 2, 2026Updated 5 months ago
ruzhila / voiceapi
View on GitHub
Streaming ASR and TTS based on FastAPI+ sherpa-onnx
☆222Nov 2, 2025Updated 8 months ago
myshell-ai / MeloTTS
View on GitHub
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
☆7,553Dec 24, 2024Updated last year
moonshine-ai / moonshine
View on GitHub
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
☆10,520Updated this week
modelscope / 3D-Speaker
View on GitHub
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
☆3,077Dec 8, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
coqui-ai / TTS
View on GitHub
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
☆45,836Aug 16, 2024Updated last year
2noise / ChatTTS
View on GitHub
A generative speech model for daily dialogue.
☆39,696Apr 10, 2026Updated 3 months ago
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,947Feb 25, 2026Updated 5 months ago
QuentinFuxa / WhisperLiveKit
View on GitHub
Simultaneous speech-to-text models
☆10,567Jul 19, 2026Updated last week
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆1,023Dec 2, 2025Updated 7 months ago
m-bain / whisperX
View on GitHub
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
☆23,320Jul 13, 2026Updated 2 weeks ago
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆26,049Updated this week
lovemefan / SenseVoice.cpp
View on GitHub
Port of Funasr's Sense-voice model in C/C++
☆569Dec 19, 2025Updated 7 months ago
openai / whisper
View on GitHub
Robust Speech Recognition via Large-Scale Weak Supervision
☆106,032Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
hexgrad / kokoro
View on GitHub
https://hf.co/hexgrad/Kokoro-82M
☆8,168Aug 6, 2025Updated 11 months ago
k2-fsa / icefall
View on GitHub
☆1,467Jul 16, 2026Updated last week
RVC-Boss / GPT-SoVITS
View on GitHub
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
☆60,205Jul 22, 2026Updated last week
index-tts / index-tts
View on GitHub
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
☆22,251Jul 14, 2026Updated 2 weeks ago
modelscope / ClearerVoice-Studio
View on GitHub
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…
☆4,345Aug 14, 2025Updated 11 months ago
microsoft / VibeVoice
View on GitHub
Open-Source Frontier Voice AI
☆51,268Updated this week
rany2 / edge-tts
View on GitHub
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
☆11,608Mar 22, 2026Updated 4 months ago
ggml-org / llama.cpp
View on GitHub
LLM inference in C/C++
☆121,984Updated this week
alphacep / vosk-api
View on GitHub
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
☆15,011Jul 2, 2026Updated 3 weeks ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
langgenius / dify
View on GitHub
Build Agentic workflows, RAG pipelines, with rich AI model and tool support on one collaborative workspace. Deploy on cloud, VPC, or self…
☆150,693Updated this week
thewh1teagle / sherpa-rs
View on GitHub
Rust bindings to https://github.com/k2-fsa/sherpa-onnx
☆311Mar 8, 2026Updated 4 months ago
lipku / LiveTalking
View on GitHub
Real time interactive streaming digital human
☆8,559Jul 19, 2026Updated last week
myshell-ai / OpenVoice
View on GitHub
Instant voice cloning by MIT and MyShell. Audio foundation model.
☆37,046Apr 19, 2025Updated last year
PaddlePaddle / PaddleOCR
View on GitHub
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…
☆86,350Jul 22, 2026Updated last week
rhasspy / piper
View on GitHub
A fast, local neural text to speech system
☆11,267Aug 26, 2025Updated 11 months ago
78 / xiaozhi-esp32
View on GitHub
An MCP-based chatbot | 一个基于MCP的聊天机器人
☆28,457Updated this week