Standard-Intelligence/hertz-dev

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Standard-Intelligence/hertz-dev)

Standard-Intelligence / hertz-dev

first base model for full-duplex conversational audio

☆1,794

Alternatives and similar repositories for hertz-dev

Users that are interested in hertz-dev are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

edwko / OuteTTS
View on GitHub
Interface for OuteTTS models.
☆1,435Mar 23, 2026Updated 3 months ago
kyutai-labs / moshi
View on GitHub
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…
☆10,636May 16, 2026Updated 2 months ago
facebookresearch / spiritlm
View on GitHub
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
☆928Oct 28, 2024Updated last year
janhq / ichigo
View on GitHub
Local realtime voice AI
☆2,490Nov 26, 2025Updated 7 months ago
fixie-ai / ultravox
View on GitHub
A fast multimodal LLM for real-time voice
☆4,476Dec 12, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ictnlp / LLaMA-Omni
View on GitHub
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…
☆3,140May 19, 2025Updated last year
VITA-MLLM / Freeze-Omni
View on GitHub
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
☆388May 27, 2025Updated last year
gpt-omni / mini-omni
View on GitHub
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…
☆3,562Nov 5, 2024Updated last year
yangdongchao / RSTnet
View on GitHub
Real-time Speech-Text Foundation Model Toolkit (wip)
☆256Mar 26, 2025Updated last year
zai-org / GLM-4-Voice
View on GitHub
GLM-4-Voice | 端到端中英语音对话模型
☆3,206Dec 5, 2024Updated last year
huggingface / parler-tts
View on GitHub
Inference and training library for high-quality TTS models.
☆5,582Dec 10, 2024Updated last year
SWivid / F5-TTS
View on GitHub
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
☆14,981Jul 5, 2026Updated 2 weeks ago
zhenye234 / X-Codec-2.0
View on GitHub
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
☆360Jun 25, 2026Updated 3 weeks ago
yl4579 / StyleTTS2
View on GitHub
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
☆6,312Aug 10, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
gpt-omni / mini-omni2
View on GitHub
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
☆1,904Jan 16, 2025Updated last year
jishengpeng / WavChat
View on GitHub
A Survey of Spoken Dialogue Models (60 pages)
☆316Nov 28, 2024Updated last year
gemelo-ai / vocos
View on GitHub
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
☆1,142Aug 7, 2024Updated last year
open-mmlab / Amphion
View on GitHub
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…
☆9,957Mar 25, 2026Updated 3 months ago
haoheliu / SemantiCodec-inference
View on GitHub
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
☆254Mar 7, 2025Updated last year
canopyai / Orpheus-TTS
View on GitHub
Towards Human-Sounding Speech
☆6,247Dec 5, 2025Updated 7 months ago
fishaudio / fish-speech
View on GitHub
SOTA Open Source TTS
☆31,332Jun 9, 2026Updated last month
WhisperSpeech / WhisperSpeech
View on GitHub
An Open Source text-to-speech system built by inverting Whisper.
☆4,625Dec 14, 2025Updated 7 months ago
PolyAI-LDN / pheme
View on GitHub
☆258Mar 15, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
DigitalPhonetics / IMS-Toucan
View on GitHub
Controllable and fast Text-to-Speech for over 7000 languages!
☆2,206Jan 25, 2026Updated 5 months ago
walker-hyf / NCSSD
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆61Nov 1, 2024Updated last year
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆124Jun 4, 2025Updated last year
lucidrains / e2-tts-pytorch
View on GitHub
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
☆516Dec 20, 2025Updated 7 months ago
zhenye234 / LLaSA_training
View on GitHub
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
☆660Jan 21, 2026Updated 5 months ago
modelscope / ClearerVoice-Studio
View on GitHub
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…
☆4,315Aug 14, 2025Updated 11 months ago
huggingface / speech-to-speech
View on GitHub
Build local voice agents with open-source models
☆6,196Updated this week
MatthewCYM / VoiceBench
View on GitHub
[TACL'26] VoiceBench: Benchmarking LLM-Based Voice Assistants
☆378Jun 11, 2026Updated last month
0nutation / SpeechGPT
View on GitHub
SpeechGPT Series: Speech Large Language Models
☆1,402Jul 22, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mct10 / RepCodec
View on GitHub
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
☆195Jul 12, 2024Updated 2 years ago
lifeiteng / OmniSenseVoice
View on GitHub
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
☆897Dec 10, 2025Updated 7 months ago
ZhangXInFD / SpeechTokenizer
View on GitHub
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…
☆658Jun 9, 2024Updated 2 years ago
hubertsiuzdak / snac
View on GitHub
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
☆774Nov 19, 2024Updated last year
yangdongchao / LLM-Codec
View on GitHub
The open source code for LLM-Codec
☆147Aug 18, 2024Updated last year
metavoiceio / metavoice-src
View on GitHub
Foundational model for human-like, expressive TTS
☆4,203Jul 30, 2024Updated last year
WangHelin1997 / SSR-Speech
View on GitHub
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
☆154Jan 1, 2025Updated last year