π¬ ASR FastAPI server using faster-whisper and Multi-Scale Auto-Tuning Spectral Clustering for diarization.
β217Oct 30, 2024Updated last year
Alternatives and similar repositories for wordcab-transcribe
Users that are interested in wordcab-transcribe are comparing it to the libraries listed below
Sorting:
- This script is an automated survey bot that conducts political discussions over phone calls. It uses Flask, Twilio's Voice API, OpenAI's β¦β12Sep 21, 2023Updated 2 years ago
- β357Mar 17, 2024Updated last year
- Zero-shot multimodal punctuation insertion and truecasing using Whisperβ119Feb 4, 2023Updated 3 years ago
- Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisperβ5,409Feb 23, 2026Updated 2 weeks ago
- UTAUTAI(Unrestricted Tune Automated Technology Artificial Interigence)β15Oct 27, 2023Updated 2 years ago
- An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engineβ542Aug 27, 2024Updated last year
- Unofficial implementation of ConvNeXt-TTS powered by lightningβ18Oct 20, 2024Updated last year
- speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts withβ¦β252Feb 10, 2026Updated 3 weeks ago
- VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogramβ265Jul 25, 2024Updated last year
- β52Jun 24, 2025Updated 8 months ago
- Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated textsβ348Nov 12, 2024Updated last year
- Transcription, forced alignment, and audio indexing with OpenAI's Whisperβ2,169Oct 29, 2025Updated 4 months ago
- Prosody and Pronunciation Modification Networkβ63May 5, 2025Updated 10 months ago
- Implementations of growing and pruning in neural networksβ22Jul 26, 2023Updated 2 years ago
- Minimal extension of OpenAI's Whisper adding speaker diarization with special tokensβ539Nov 6, 2023Updated 2 years ago
- A python package to build AI-powered real-time audio applicationsβ1,938Feb 12, 2025Updated last year
- WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)β20,556Feb 22, 2026Updated 2 weeks ago
- Coqui STT Model Manager - install, manage and try out Coqui STT models from the Model Zooβ26Mar 24, 2023Updated 2 years ago
- JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.β4,686Apr 3, 2024Updated last year
- Whisper realtime streaming for long speech-to-text transcription and translationβ3,546Nov 12, 2025Updated 3 months ago
- ez audio transcription tool with flexible processing and post-processing optionsβ163Feb 1, 2024Updated 2 years ago
- Median is an open-source flashcard application that leverages the power of spaced repetition and artificial intelligence to transform theβ¦β22Nov 4, 2024Updated last year
- β23Oct 17, 2024Updated last year
- Torch implementation of Whisper-guided DDPM based Voice Conversionβ49Mar 7, 2023Updated 3 years ago
- This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team atβ¦β441Aug 12, 2025Updated 6 months ago
- Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-Eβ135Oct 23, 2024Updated last year
- β21Mar 3, 2025Updated last year
- This repository contains all the code necessary for running the multilingual distilwhisper from Ferraz et al. 2024 IEEE ICASSP paper.β33Oct 23, 2025Updated 4 months ago
- Streaming transcriber with whisperβ696May 1, 2023Updated 2 years ago
- ONNX Inference of Pyannote Segmentationβ97Dec 23, 2024Updated last year
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correctionβ269May 19, 2024Updated last year
- The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"β186Sep 24, 2025Updated 5 months ago
- Implementation for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using Γ-VAE"β44Apr 10, 2023Updated 2 years ago
- Multilingual Automatic Speech Recognition with word-level timestamps and confidenceβ2,773Sep 9, 2025Updated 6 months ago
- KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a lβ¦β25Jul 27, 2024Updated last year
- Contains Colab Notebooks show cool use-cases of different GCP ML APIs.β10Nov 5, 2020Updated 5 years ago
- Stream torrents to VLC using Peerflix and torrent using your terminalβ10Feb 15, 2018Updated 8 years ago
- Open Source Speech Inferencing Libary for Indic Languagesβ13Apr 11, 2022Updated 3 years ago
- semantic tokenizer for speech and musicβ21Jul 6, 2025Updated 8 months ago