πΈSTT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
β2,569Mar 11, 2024Updated last year
Alternatives and similar repositories for STT
Users that are interested in STT are comparing it to the libraries listed below
Sorting:
- πΈSTT integration examplesβ130Sep 23, 2022Updated 3 years ago
- πΈπ¬ - a deep learning toolkit for Text-to-Speech, battle-tested in research and productionβ44,691Aug 16, 2024Updated last year
- DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Rasβ¦β26,736Jun 19, 2025Updated 8 months ago
- A PyTorch-based Speech Toolkitβ11,277Updated this week
- Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)β10,118Nov 9, 2023Updated 2 years ago
- Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Nodeβ14,301Feb 22, 2026Updated last week
- π A list of accessible speech corpora for ASR, TTS, and other Speech Technologiesβ1,386Jun 6, 2024Updated last year
- Facebook AI Research's Automatic Speech Recognition Toolkitβ6,446Jan 12, 2026Updated last month
- kaldi-asr/kaldi is the official location of the Kaldi project.β15,331Sep 22, 2025Updated 5 months ago
- End-to-End Speech Processing Toolkitβ9,747Feb 26, 2026Updated last week
- π Coqui's machine learning job schedulerβ31Sep 5, 2021Updated 4 years ago
- Silero Models: pre-trained text-to-speech models made embarrassingly simpleβ5,793Feb 3, 2026Updated last month
- A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Autoβ¦β16,843Updated this week
- Robust Speech Recognition via Large-Scale Weak Supervisionβ95,527Dec 15, 2025Updated 2 months ago
- πΈ collection of TTS papersβ723Jul 4, 2024Updated last year
- Open models for Coqui STTβ153May 9, 2023Updated 2 years ago
- Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speakerβ¦β9,274Feb 20, 2026Updated 2 weeks ago
- Tools for handling multimodal data in machine learning projects.β1,116Updated this week
- Port of OpenAI's Whisper model in C/C++β47,262Updated this week
- TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwβ¦β1,006Jun 11, 2025Updated 8 months ago
- Silero VAD: pre-trained enterprise-grade Voice Activity Detectorβ8,279Feb 24, 2026Updated last week
- Faster Whisper transcription with CTranslate2β21,289Nov 19, 2025Updated 3 months ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.β32,170Sep 30, 2025Updated 5 months ago
- πΈTTS recipes for different datasetsβ86Jul 26, 2022Updated 3 years ago
- Large, modern dataset for speech recognitionβ721Feb 26, 2024Updated 2 years ago
- β357Mar 17, 2024Updated last year
- TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, Germaβ¦β3,993Jul 5, 2024Updated last year
- WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)β20,368Feb 22, 2026Updated last week
- A C++ standalone library for machine learningβ5,437Feb 23, 2026Updated last week
- An Open Source text-to-speech system built by inverting Whisper.β4,567Dec 14, 2025Updated 2 months ago
- An On-Premises, Streaming Speech Recognition Systemβ682Nov 28, 2021Updated 4 years ago
- A multi-voice TTS system trained with an emphasis on qualityβ14,818Nov 19, 2024Updated last year
- State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.β3,901Jan 4, 2024Updated 2 years ago
- Fast inference engine for Transformer modelsβ4,342Feb 4, 2026Updated last month
- eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.β6,181Updated this week
- π Text-Prompted Generative Audio Modelβ39,039Aug 19, 2024Updated last year
- Espresso: A Fast End-to-End Neural Speech Recognition Toolkitβ940Sep 4, 2024Updated last year
- A fast, local neural text to speech systemβ10,633Aug 26, 2025Updated 6 months ago
- Simple text to phones converter for multiple languagesβ1,515Sep 26, 2024Updated last year