huggingface/speech-to-speech

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/huggingface/speech-to-speech)

huggingface / speech-to-speech

Build local voice agents with open-source models

☆6,345

Alternatives and similar repositories for speech-to-speech

Users that are interested in speech-to-speech are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kyutai-labs / moshi
View on GitHub
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…
☆10,717May 16, 2026Updated 2 months ago
pipecat-ai / pipecat
View on GitHub
Open Source framework for voice and multimodal conversational AI
☆13,702Updated this week
huggingface / parler-tts
View on GitHub
Inference and training library for high-quality TTS models.
☆5,581Dec 10, 2024Updated last year
gpt-omni / mini-omni
View on GitHub
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…
☆3,563Nov 5, 2024Updated last year
fishaudio / fish-speech
View on GitHub
SOTA Open Source TTS
☆31,373Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
snakers4 / silero-vad
View on GitHub
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
☆9,672Jul 16, 2026Updated last week
BayLing-Models / BayLing-Speech
View on GitHub
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…
☆3,143May 19, 2025Updated last year
fixie-ai / ultravox
View on GitHub
A fast multimodal LLM for real-time voice
☆4,481Dec 12, 2025Updated 7 months ago
livekit / agents
View on GitHub
A framework for building realtime voice AI agents 🤖🎙️📹
☆11,499Updated this week
SWivid / F5-TTS
View on GitHub
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
☆15,004Updated this week
janhq / ichigo
View on GitHub
Local realtime voice AI
☆2,490Nov 26, 2025Updated 7 months ago
zai-org / GLM-4-Voice
View on GitHub
GLM-4-Voice | 端到端中英语音对话模型
☆3,209Dec 5, 2024Updated last year
unslothai / unsloth
View on GitHub
Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.
☆68,878Updated this week
canopyai / Orpheus-TTS
View on GitHub
Towards Human-Sounding Speech
☆6,260Dec 5, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
edwko / OuteTTS
View on GitHub
Interface for OuteTTS models.
☆1,436Mar 23, 2026Updated 4 months ago
resemble-ai / chatterbox
View on GitHub
SoTA open-source TTS
☆25,697Updated this week
QwenAudio / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,400May 25, 2026Updated 2 months ago
Standard-Intelligence / hertz-dev
View on GitHub
first base model for full-duplex conversational audio
☆1,794Jan 5, 2025Updated last year
pipecat-ai / smart-turn
View on GitHub
☆1,483Jan 29, 2026Updated 5 months ago
mem0ai / mem0
View on GitHub
Universal memory layer for AI Agents
☆61,666Updated this week
m-bain / whisperX
View on GitHub
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
☆23,252Jul 13, 2026Updated last week
agno-agi / agno
View on GitHub
Build, run, and manage agent platforms.
☆41,417Updated this week
moonshine-ai / moonshine
View on GitHub
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
☆10,435Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
QwenLM / Qwen2-Audio
View on GitHub
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
☆2,093Apr 21, 2025Updated last year
TEN-framework / ten-framework
View on GitHub
Open-source framework for conversational voice AI agents
☆10,966Updated this week
kyutai-labs / delayed-streams-modeling
View on GitHub
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
☆2,984Jan 26, 2026Updated 6 months ago
SYSTRAN / faster-whisper
View on GitHub
Faster Whisper transcription with CTranslate2
☆24,518Nov 19, 2025Updated 8 months ago
coqui-ai / TTS
View on GitHub
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
☆45,809Aug 16, 2024Updated last year
SesameAILabs / csm
View on GitHub
A Conversational Speech Generation Model
☆14,696May 27, 2025Updated last year
yl4579 / StyleTTS2
View on GitHub
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
☆6,318Aug 10, 2024Updated last year
gradio-app / fastrtc
View on GitHub
The python library for real-time communication
☆4,616Jan 12, 2026Updated 6 months ago
Cinnamon / kotaemon
View on GitHub
An open-source RAG-based tool for chatting with your documents.
☆25,619Jul 14, 2026Updated last week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
OpenBMB / MiniCPM-V
View on GitHub
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
☆25,989Updated this week
myshell-ai / OpenVoice
View on GitHub
Instant voice cloning by MIT and MyShell. Audio foundation model.
☆37,020Apr 19, 2025Updated last year
WhisperSpeech / WhisperSpeech
View on GitHub
An Open Source text-to-speech system built by inverting Whisper.
☆4,624Dec 14, 2025Updated 7 months ago
Vaibhavs10 / insanely-fast-whisper
View on GitHub
☆12,996Oct 25, 2025Updated 9 months ago
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆87,138Updated this week
open-mmlab / Amphion
View on GitHub
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…
☆9,964Mar 25, 2026Updated 4 months ago
QwenAudio / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,935Updated this week