kyutai-labs/hibiki-zero

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kyutai-labs/hibiki-zero)

kyutai-labs / hibiki-zero

A real-time and multilingual speech translation model

☆262

Alternatives and similar repositories for hibiki-zero

Users that are interested in hibiki-zero are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kyutai-labs / tts_longeval
View on GitHub
☆30Apr 29, 2026Updated 2 months ago
kyutai-labs / moshi-finetune
View on GitHub
☆473Oct 3, 2025Updated 9 months ago
Kugelaudio / kugelaudio-open
View on GitHub
Open-source text-to-speech for European languages with voice cloning
☆267Feb 6, 2026Updated 5 months ago
KetsuiLabs / MichiAI
View on GitHub
MichiAI: A Low Latency, Full Duplex Speech LLM with zero coherence loss
☆105Apr 24, 2026Updated 2 months ago
MaikeZuefle / f-actor
View on GitHub
☆28Updated this week
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
kyutai-labs / delayed-streams-modeling
View on GitHub
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
☆2,980Jan 26, 2026Updated 5 months ago
Alittleegg / Eureka-Audio
View on GitHub
Eureka-Audio: A 1.7B lightweight audio–language model that matches 7B–30B models on ASR, audio understanding, and paralinguistic reasonin…
☆40Apr 11, 2026Updated 3 months ago
ekwek1 / soprano-factory
View on GitHub
Soprano-Factory: Train your own 2000x realtime text-to-speech model
☆249Jan 13, 2026Updated 6 months ago
kyutai-labs / moshi-rag
View on GitHub
MoshiRAG is a compact full-duplex speech language model augmented with asynchronous knowledge retrieval to improve factuality without sac…
☆130Apr 28, 2026Updated 2 months ago
wx9Songs / MOSS-Music-Data-Pipeline
View on GitHub
☆43Apr 26, 2026Updated 2 months ago
kyutai-labs / pocket-tts
View on GitHub
A TTS that fits in your CPU (and pocket)
☆7,793Updated this week
yu-haoyuan / fd-badcat
View on GitHub
fd-sds
☆20Apr 8, 2026Updated 3 months ago
ozspeech / OZSpeech
View on GitHub
[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
☆45Feb 9, 2025Updated last year
facebookresearch / omnilingual-asr
View on GitHub
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
☆2,854Dec 30, 2025Updated 6 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆54May 1, 2025Updated last year
adelacvg / diff-vits
View on GitHub
☆39Oct 1, 2023Updated 2 years ago
kyutai-labs / hibiki
View on GitHub
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…
☆1,486Apr 15, 2025Updated last year
AI-S2-Lab / FluentEditor
View on GitHub
[InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency
☆62Oct 23, 2024Updated last year
xinshengwang / robpitch
View on GitHub
A pitch detection model trained to be robust against noise and reverberation environments.
☆27Jan 21, 2025Updated last year
ysharma3501 / MiraTTS
View on GitHub
A high quality and fast TTS repository
☆517Dec 22, 2025Updated 6 months ago
kyutai-labs / unmute
View on GitHub
Make text LLMs listen and speak
☆1,365Updated this week
ex3ndr / supervoice-hybrid
View on GitHub
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
☆26Aug 5, 2024Updated last year
pipecat-ai / stt-benchmark
View on GitHub
Benchmarking STT service TTFB and semantic WER for real-time AI applications
☆89Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
HeCheng0625 / Diffusion-Speech-Tokenizer
View on GitHub
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆198Jan 25, 2026Updated 5 months ago
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
primepake / dac_vae
View on GitHub
Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder
☆38Aug 30, 2025Updated 10 months ago
Liquid4All / liquid-audio
View on GitHub
Liquid Audio - Speech-to-Speech audio models by Liquid AI
☆548Jun 5, 2026Updated last month
HumeAI / tada
View on GitHub
Open Source Speech Language Model
☆1,007May 11, 2026Updated 2 months ago
ysharma3501 / NovaSR
View on GitHub
A lightning fast audio upsampler.
☆774Feb 26, 2026Updated 4 months ago
ekwek1 / soprano
View on GitHub
Soprano: Instant, Ultra-Realistic Text-to-Speech
☆1,250Jan 15, 2026Updated 6 months ago
lingjzhu / spoken_sent_embedding
View on GitHub
Unsupervised spoken sentence embeddings
☆14Dec 14, 2022Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ysharma3501 / FlashSR
View on GitHub
Fast audio super resolution from 16khz to 48khz.
☆215Jan 3, 2026Updated 6 months ago
bfs18 / armel
View on GitHub
poorman's ar-dit tts
☆45Dec 31, 2025Updated 6 months ago
maxrmorrison / promonet
View on GitHub
Prosody and Pronunciation Modification Network
☆64May 5, 2025Updated last year
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆1,015Dec 2, 2025Updated 7 months ago
JaesungHuh / ca-subtitle
View on GitHub
Implementation of "Look, Listen and Recognise:character-aware audio-visual subtitling"
☆21Nov 3, 2025Updated 8 months ago
ajd12342 / paraspeechclap
View on GitHub
Codebase for 'ParaSpeechCLAP: A Dual-Encoder Speech-Text Model for Rich Stylistic Language-Audio Pretraining'
☆23Jun 20, 2026Updated last month
lucadellalib / focalcodec
View on GitHub
A low-bitrate single-codebook 16 / 24 kHz speech codec based on focal modulation
☆172Nov 30, 2025Updated 7 months ago