coqui-ai / STTLinks
πΈSTT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
β2,453Updated last year
Alternatives and similar repositories for STT
Users that are interested in STT are comparing it to the libraries listed below
Sorting:
- Examples of how to use or integrate DeepSpeechβ852Updated last year
- π A list of accessible speech corpora for ASR, TTS, and other Speech Technologiesβ1,336Updated last year
- Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)β9,878Updated last year
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Modelsβ5,799Updated 10 months ago
- TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, Germaβ¦β3,942Updated 11 months ago
- Text to Speech engine based on the Tacotron architecture, initially implemented by Keith Ito.β584Updated 3 years ago
- YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyoneβ981Updated 7 months ago
- A python package to analyze and compare voices with deep learningβ3,001Updated last year
- Unified-Modal Speech-Text Pre-Training for Spoken Language Processingβ1,368Updated last year
- Open Text to Speech Serverβ1,067Updated last year
- π€π¬ Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.β1,149Updated last year
- Mimic Recording Studio is a Docker-based application you can install to record voice samples, which can then be trained into a TTS voice β¦β509Updated 2 years ago
- A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)β2,977Updated last year
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.β3,883Updated 5 months ago
- An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing, code-switching, and voice cloning.β837Updated last year
- eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.β5,193Updated last week
- Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speakerβ¦β7,697Updated last week
- A fast, local neural text to speech systemβ9,342Updated last month
- π Text-prompted Generative Audio Model - With the ability to clone voicesβ3,308Updated last year
- A fast local neural text to speech engine for Mycroftβ1,193Updated 2 months ago
- WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi librariesβ1,103Updated 3 weeks ago
- End-to-End Speech Processing Toolkitβ9,205Updated this week
- TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwβ¦β982Updated last week
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesisβ2,162Updated 10 months ago
- An Open Source text-to-speech system built by inverting Whisper.β4,286Updated last week
- On-device streaming speech-to-text engine powered by deep learningβ631Updated last week
- VOSK Speech Recognition Toolkitβ445Updated 2 years ago
- Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorchβ1,607Updated last year
- πΈ collection of TTS papersβ697Updated 11 months ago
- Facebook AI Research's Automatic Speech Recognition Toolkitβ6,432Updated 6 months ago