Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
☆2,685Jan 22, 2026Updated last month
Alternatives and similar repositories for supertonic
Users that are interested in supertonic are comparing it to the libraries listed below
Sorting:
- LEMAS‑TTS is a multilingual zero‑shot text‑to‑speech system, supporting 10 languages: Chinese English Spanish Russian French German Ital…☆91Jan 14, 2026Updated last month
- TTS model capable of streaming conversational audio in realtime.☆1,081Nov 29, 2025Updated 3 months ago
- [EMNLP 2025 Findings] Official code for EZ-VC: Easy Zero-shot Any-to-Any Voice Conversion☆35Sep 9, 2025Updated 6 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆19,153Nov 19, 2025Updated 3 months ago
- Towards Human-Sounding Speech☆5,983Dec 5, 2025Updated 3 months ago
- Trainging, inference, and testing of the SAC speech codec model.☆100Nov 1, 2025Updated 4 months ago
- Interface for OuteTTS models.☆1,427Jun 21, 2025Updated 8 months ago
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability☆107Jan 17, 2025Updated last year
- Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expres…☆7,193Mar 5, 2025Updated last year
- SOTA Open Source TTS☆25,154Updated this week
- Soprano: Instant, Ultra-Realistic Text-to-Speech☆1,197Jan 15, 2026Updated last month
- ☆100Jan 19, 2026Updated last month
- A TTS that fits in your CPU (and pocket)☆3,430Mar 1, 2026Updated last week
- SoTA open-source TTS☆22,998Feb 3, 2026Updated last month
- Inference and training library for high-quality TTS models.☆5,547Dec 10, 2024Updated last year
- On-device TTS model by Neuphonic☆4,880Feb 26, 2026Updated last week
- Controllable and fast Text-to-Speech for over 7000 languages!☆2,190Jan 25, 2026Updated last month
- Silero VAD: pre-trained enterprise-grade Voice Activity Detector☆8,384Updated this week
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,196Aug 10, 2024Updated last year
- https://hf.co/hexgrad/Kokoro-82M☆5,847Aug 6, 2025Updated 7 months ago
- A highly optimized engine for neutts-air model to generate minutes of audio in seconds. Over 200x realtime on modern hardware!☆113Nov 24, 2025Updated 3 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆14,169Updated this week
- VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency☆190Oct 26, 2025Updated 4 months ago
- Soprano-Factory: Train your own 2000x realtime text-to-speech model☆211Jan 13, 2026Updated last month
- State-of-the-art TTS model under 25MB 😻☆11,144Feb 24, 2026Updated last week
- Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report☆49Sep 2, 2025Updated 6 months ago
- 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production☆44,763Aug 16, 2024Updated last year
- PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions☆84Oct 11, 2024Updated last year
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,799Updated this week
- A real-time streaming conversational video system that transforms text interactions into continuous, high-fidelity video responses using …☆307Dec 15, 2025Updated 2 months ago
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.☆153Jan 27, 2026Updated last month
- ☆6,070Aug 29, 2025Updated 6 months ago
- Open-Source Frontier Voice AI☆23,610Feb 28, 2026Updated last week
- Foundational model for human-like, expressive TTS☆4,198Jul 30, 2024Updated last year
- ☆454Nov 2, 2025Updated 4 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,568Dec 14, 2025Updated 2 months ago
- Kanade is a single-layer disentangled speech tokenizer that extracts compact tokens suitable for both generative and discriminative model…☆82Feb 3, 2026Updated last month
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆36,049Apr 19, 2025Updated 10 months ago
- MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flex…☆1,191Mar 2, 2026Updated last week