linto-ai / linto-stt
An automatic speech recognition API
☆54Updated this week
Alternatives and similar repositories for linto-stt:
Users that are interested in linto-stt are comparing it to the libraries listed below
- On-device speaker diarization powered by deep learning☆37Updated 3 weeks ago
- Speaker diarization service☆21Updated 3 weeks ago
- Various speech datasets made available to the public☆114Updated 3 months ago
- Reproducible experimental protocols for multimedia (audio, video, text) database☆98Updated last month
- A curated list of awesome voice activity detection☆40Updated 3 months ago
- How to create your own model for vosk☆70Updated 3 years ago
- On-device voice activity detection (VAD) powered by deep learning☆201Updated last week
- Create an LJSpeech structured voice dataset on wave input☆26Updated 5 months ago
- 🐸STT integration examples☆125Updated 2 years ago
- Coqui STT Model Manager - install, manage and try out Coqui STT models from the Model Zoo☆25Updated last year
- Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code☆145Updated 10 months ago
- Model for recasing and repunctuating ASR transcripts☆133Updated 11 months ago
- C++ version of pyannote audio speaker diarizaiton pipeline☆20Updated last year
- An even smaller speech recognizer / force aligner☆32Updated 2 months ago
- A model that predicts the punctuation of English, Italian, French and German texts.☆79Updated 2 years ago
- Speaker change detection using SincNet and an LSTM/Transformer☆47Updated 8 months ago
- ☆38Updated 3 years ago
- Mirror of hf.co/pyannote/speaker-diarization-3.1☆20Updated last year
- On-device noise suppression powered by deep learning☆67Updated 3 weeks ago
- ☆10Updated 2 weeks ago
- ONNX Inference of Pyannote Segmentation☆80Updated 2 months ago
- ☆53Updated last year
- A data annotation pipeline to generate high-quality, large-scale speech datasets with machine pre-labeling and fully manual auditing.☆101Updated last year
- Open models for Coqui STT☆130Updated last year
- Speaker diarization model☆24Updated last year
- 🌼 Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition☆15Updated last year
- The human speaks a language with an accent. A particular accent necessarily reflects a person's linguistic background. The model defines …☆59Updated 3 years ago
- Zero-shot multimodal punctuation insertion and truecasing using Whisper☆109Updated 2 years ago
- Simple Kaldi model server for chain (nnet3) models in online recognition mode directly from a local microphone☆35Updated 3 years ago
- ☆39Updated last year