coqui-ai / STT-models
Open models for Coqui STT
β138Updated last year
Alternatives and similar repositories for STT-models:
Users that are interested in STT-models are comparing it to the libraries listed below
- πΈSTT integration examplesβ127Updated 2 years ago
- A live speech recognition using Facebooks wav2vec 2.0 model.β352Updated last year
- On-device voice activity detection (VAD) powered by deep learningβ208Updated this week
- A tokenizer, text cleaner, and phonemizer for many human languages.β310Updated 5 months ago
- C++ library for converting text to phonemes for Piperβ117Updated last year
- Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of codeβ148Updated last year
- SEPIA server to support open-source speech recognition via WebSocket connection.β126Updated 5 months ago
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelinesβ94Updated 11 months ago
- β326Updated 10 months ago
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.β136Updated last year
- Official Implementation of StyleTTSβ431Updated 3 months ago
- Voice models for Mimic 3 text to speech systemβ143Updated 10 months ago
- πΈ - A general purpose model trainer, as flexible as it getsβ214Updated last year
- Desktop application for neural speech synthesis written in C++β215Updated 2 years ago
- On-device noise suppression powered by deep learningβ69Updated 3 weeks ago
- β255Updated last year
- openvino version of openai/whisperβ166Updated last year
- β359Updated 8 months ago
- PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.β166Updated last year
- Real-Time Whisper Voice Recognition with vosk model feedback.β112Updated last year
- π€ Nix-TTS: Lightweight and End-to-end Text-to-Speech via Module-wise Distillationβ249Updated last year
- Putting flows on top of neural transducers for better TTSβ62Updated last month
- π Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. π§π₯π Advanced audio processing.β243Updated 10 months ago
- Model for recasing and repunctuating ASR transcriptsβ133Updated last year
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPβ¦β95Updated 6 months ago
- Grapheme to phoneme conversion with deep learning.β381Updated last year
- Text to speech alignment using CTC forced alignmentβ279Updated last month
- Coqui AI TTS pluginβ74Updated last month
- FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversionβ661Updated 3 months ago
- Whisper realtime streaming for long speech-to-text transcription and translationβ114Updated last year