Wordcab / wordcab-transcribeView external linksLinks
π¬ ASR FastAPI server using faster-whisper and Multi-Scale Auto-Tuning Spectral Clustering for diarization.
β217Oct 30, 2024Updated last year
Alternatives and similar repositories for wordcab-transcribe
Users that are interested in wordcab-transcribe are comparing it to the libraries listed below
Sorting:
- This script is an automated survey bot that conducts political discussions over phone calls. It uses Flask, Twilio's Voice API, OpenAI's β¦β11Sep 21, 2023Updated 2 years ago
- β357Mar 17, 2024Updated last year
- Zero-shot multimodal punctuation insertion and truecasing using Whisperβ119Feb 4, 2023Updated 3 years ago
- Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisperβ5,355Nov 26, 2025Updated 2 months ago
- ASR + diarization model server with speculative decodingβ64May 22, 2024Updated last year
- An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engineβ540Aug 27, 2024Updated last year
- UTAUTAI(Unrestricted Tune Automated Technology Artificial Interigence)β15Oct 27, 2023Updated 2 years ago
- Unofficial implementation of ConvNeXt-TTS powered by lightningβ18Oct 20, 2024Updated last year
- speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts withβ¦β250Updated this week
- VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogramβ265Jul 25, 2024Updated last year
- β52Jun 24, 2025Updated 7 months ago
- Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated textsβ348Nov 12, 2024Updated last year
- Transcription, forced alignment, and audio indexing with OpenAI's Whisperβ2,158Oct 29, 2025Updated 3 months ago
- Prosody and Pronunciation Modification Networkβ62May 5, 2025Updated 9 months ago
- Implementations of growing and pruning in neural networksβ22Jul 26, 2023Updated 2 years ago
- Minimal extension of OpenAI's Whisper adding speaker diarization with special tokensβ536Nov 6, 2023Updated 2 years ago
- WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)β20,051Updated this week
- A python package to build AI-powered real-time audio applicationsβ1,931Feb 12, 2025Updated last year
- Coqui STT Model Manager - install, manage and try out Coqui STT models from the Model Zooβ26Mar 24, 2023Updated 2 years ago
- JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.β4,683Apr 3, 2024Updated last year
- ez audio transcription tool with flexible processing and post-processing optionsβ162Feb 1, 2024Updated 2 years ago
- Whisper realtime streaming for long speech-to-text transcription and translationβ3,530Nov 12, 2025Updated 3 months ago
- β23Oct 17, 2024Updated last year
- Torch implementation of Whisper-guided DDPM based Voice Conversionβ49Mar 7, 2023Updated 2 years ago
- This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team atβ¦β439Aug 12, 2025Updated 6 months ago
- Style-Controllable Zero-Shot Text to Speech Synthesizer based on VALL-Eβ135Oct 23, 2024Updated last year
- β21Mar 3, 2025Updated 11 months ago
- This repository contains all the code necessary for running the multilingual distilwhisper from Ferraz et al. 2024 IEEE ICASSP paper.β33Oct 23, 2025Updated 3 months ago
- Streaming transcriber with whisperβ694May 1, 2023Updated 2 years ago
- ONNX Inference of Pyannote Segmentationβ97Dec 23, 2024Updated last year
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correctionβ267May 19, 2024Updated last year
- The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"β185Sep 24, 2025Updated 4 months ago
- Implementation for paper "Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using Γ-VAE"β44Apr 10, 2023Updated 2 years ago
- Multilingual Automatic Speech Recognition with word-level timestamps and confidenceβ2,759Sep 9, 2025Updated 5 months ago
- KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a lβ¦β25Jul 27, 2024Updated last year
- semantic tokenizer for speech and musicβ21Jul 6, 2025Updated 7 months ago
- Contains Colab Notebooks show cool use-cases of different GCP ML APIs.β10Nov 5, 2020Updated 5 years ago
- This project shows how to build a simple handwriting recognizer in Keras with the IAM dataset.β13Aug 15, 2021Updated 4 years ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrievalβ13Jun 27, 2025Updated 7 months ago