thomasmol/cog-whisper-diarization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thomasmol/cog-whisper-diarization)

thomasmol / cog-whisper-diarization

Cog implementation of transcribing + diarization pipeline with Whisper & Pyannote

☆237

Alternatives and similar repositories for cog-whisper-diarization

Users that are interested in cog-whisper-diarization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Mastering-Python-GT / Transcription-diarization-whisper-pyannote
View on GitHub
Transcription and diarization (speaker identification)
☆33May 31, 2023Updated 3 years ago
MahmoudAshraf97 / whisper-diarization
View on GitHub
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
☆5,601Feb 23, 2026Updated 4 months ago
victor-upmeet / whisperx-replicate
View on GitHub
☆46May 13, 2026Updated 2 months ago
pengzhendong / ngram-punctuator
View on GitHub
An N-gram punctuator for Chinese and English.
☆18Oct 14, 2025Updated 9 months ago
huangruizhe / audio
View on GitHub
Data manipulation and transformation for audio signal processing, powered by PyTorch
☆10Sep 30, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
m-bain / whisperX
View on GitHub
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
☆23,143Jul 13, 2026Updated last week
LAION-AI / Vocalino-V0.1-Voice-Acting-Pipeline
View on GitHub
Open-weights voice acting pipeline combining zero-shot voice cloning with natural-language direction. Provide a reference voice (or gener…
☆16May 25, 2026Updated last month
KoljaB / WhoSpeaks
View on GitHub
Efficient approach to speaker diarization using voice characteristics extraction
☆109Jun 26, 2026Updated 3 weeks ago
Mddct / usm-tokenizer
View on GitHub
semantic tokenizer for speech and music
☆20Jul 6, 2025Updated last year
pashanitw / W2V2-BERT-ASR-Training
View on GitHub
☆15Mar 25, 2024Updated 2 years ago
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
shigabeev / russian_tts_normalization
View on GitHub
Fast Russian Text normalization for TTS using only RegEx.
☆36Jun 27, 2026Updated 3 weeks ago
Pwntus / replicate-support-gpt
View on GitHub
Get support, in seconds.
☆13Jan 22, 2024Updated 2 years ago
merlresearch / sebbs
View on GitHub
Prediction of sound event bounding boxes (SEBBs)
☆35Aug 2, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
arihanv / Shush
View on GitHub
Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app
☆227Jun 7, 2024Updated 2 years ago
pyannote / pyannote-audio
View on GitHub
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker…
☆10,302Updated this week
andreyryabtsev / comfyui-python-api
View on GitHub
Utilities library for working with the ComfyUI API
☆46Dec 19, 2023Updated 2 years ago
Kazuhito00 / onnx-model-encrypt-sample
View on GitHub
ONNXモデルをpyca/cryptographyを用いて暗号化/復号化するサンプル
☆16Mar 19, 2022Updated 4 years ago
Vaibhavs10 / insanely-fast-whisper
View on GitHub
☆12,988Oct 25, 2025Updated 8 months ago
silviapfeiffer / WebVTT-with-regions
View on GitHub
Experimental implementation of regions in WebVTT building on Anne's WebVTT parser.
☆14Oct 19, 2014Updated 11 years ago
JarodMica / StyleTTS-ZS
View on GitHub
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
☆10Sep 22, 2024Updated last year
google / df-conformer
View on GitHub
Audio samples accompanying publications related to DF-Conformer, a speech enhancement model.
☆36Jun 23, 2026Updated 3 weeks ago
JaesungHuh / SimpleDiarization
View on GitHub
Simple diarization model
☆53Jun 13, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
lucataco / cog-nsfw_image_detection
View on GitHub
Cog wrapper for FalconsAi / nsfw_image_detection
☆19Aug 6, 2025Updated 11 months ago
h-munakata / Lighthouse-Wrapper-for-Audio-Moment-Retrieval
View on GitHub
☆13Mar 23, 2026Updated 3 months ago
nsu-ai-team / russian_g2p_neuro
View on GitHub
Experiments with grapheme2phoneme for Russian based on the artificial neural networks
☆20Apr 1, 2021Updated 5 years ago
meronym / speaker-diarization
View on GitHub
Speaker diarization model
☆31Apr 1, 2023Updated 3 years ago
revdotcom / reverb
View on GitHub
Open source inference code for Rev's model
☆436Apr 22, 2025Updated last year
chenpk00 / IS2024_stream_decoder_only_asr
View on GitHub
☆16Mar 12, 2024Updated 2 years ago
yusunnny / CST-former
View on GitHub
CST-former: Transformer with Channel-Spectro-Temporal Attention for Sound Event Localization and Detection (ICASSP 2024)
☆38May 20, 2025Updated last year
linto-ai / whisper-timestamped
View on GitHub
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
☆2,825Sep 9, 2025Updated 10 months ago
zhvng / ai-crossword
View on GitHub
Generate crossword puzzles with GPT-3
☆12Jan 15, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
FanaHOVA / smol-podcaster
View on GitHub
smol-podcaster is your podcast production agent 🎙️
☆413Nov 10, 2025Updated 8 months ago
JaesungHuh / av-diarization
View on GitHub
Audio-visual diarization pipeline used for creating VoxConverse dataset
☆22Jun 6, 2025Updated last year
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year
LAION-AI / emotion-annotations
View on GitHub
☆110Updated this week
mubingshen / MLC-SLM-Baseline
View on GitHub
The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…
☆51May 14, 2025Updated last year
Hannes1 / react-native-wenet
View on GitHub
Wenet speech to text for react native
☆10Nov 1, 2022Updated 3 years ago
camenduru / VisualStylePrompting-jupyter
View on GitHub
☆13Mar 15, 2024Updated 2 years ago