pengzhendong/speaker-diarization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pengzhendong/speaker-diarization)

pengzhendong / speaker-diarization

Offline Speaker Diarization with SenseVoice by Sherpa ONNX.

☆15

Alternatives and similar repositories for speaker-diarization

Users that are interested in speaker-diarization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pengzhendong / streaming-asr
View on GitHub
One command to start a streaming ASR server.
☆12Oct 2, 2024Updated last year
pengzhendong / audio-pipeline
View on GitHub
☆23Oct 17, 2024Updated last year
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
pengzhendong / streaming-tts-webui
View on GitHub
Streaming Text to Speech Web UI
☆22May 6, 2024Updated 2 years ago
Mddct / simple-tts
View on GitHub
（WIP）long form speech generatoins
☆30Apr 2, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
pengzhendong / asr-decoder
View on GitHub
CTC decoder with hotwords for ASR.
☆38Jun 15, 2026Updated last month
pengzhendong / pysilero
View on GitHub
Python Wrapper of Silero VAD
☆63May 8, 2025Updated last year
lovemefan / Silero-vad-pytorch
View on GitHub
silero-vad pytorch implement
☆38Nov 23, 2024Updated last year
MaxMax2016 / max-vc
View on GitHub
singing voice conversion without f0
☆23May 10, 2023Updated 3 years ago
leospark / FireRedVAD-Engineering
View on GitHub
Lightweight streaming Voice Activity Detection (VAD) tool with ONNX runtime
☆24Mar 18, 2026Updated 4 months ago
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year
fireredchat-submodules / livekit-plugins-fireredchat-pvad
View on GitHub
FireRedChat pVAD plugin for LiveKit Agents
☆22Sep 16, 2025Updated 10 months ago
philgzl / brever
View on GitHub
Speech enhancement in noisy and reverberant environments using deep neural networks
☆23Oct 10, 2025Updated 9 months ago
pengzhendong / audiolab
View on GitHub
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
☆39Mar 31, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
TaoRuijie / MFV-KSD
View on GitHub
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)
☆22Jul 25, 2024Updated 2 years ago
liuhuang31 / HiFTNet-sr
View on GitHub
HiFTNet wav/audio super-resolution 16/24 kHz to 48 kHz
☆24Jan 2, 2024Updated 2 years ago
PINTO0309 / onnx-aec
View on GitHub
A playground for experimenting with acoustic echo cancellation using a microphone, speaker, and ONNX.
☆13Oct 22, 2024Updated last year
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year
Mddct / cosyvoice2-flow-optimized
View on GitHub
faster inference
☆27Jan 20, 2025Updated last year
pengzhendong / streaming-vocos
View on GitHub
Streaming Vocos
☆31Jun 10, 2025Updated last year
pengzhendong / compute-wer
View on GitHub
Compute WER and SER for speech recognition evaluation
☆27Jun 6, 2026Updated last month
pengzhendong / wetext
View on GitHub
Python runtime for WeTextProcessing (does not depend on Pynini)
☆53Jun 11, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
pengzhendong / streaming-sensevoice
View on GitHub
Pseudo Streaming SenseVoice with Hotwords
☆467Jun 15, 2026Updated last month
pengzhendong / pyannote-onnx
View on GitHub
ONNX Inference of Pyannote Segmentation
☆99Dec 23, 2024Updated last year
pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Updated this week
Anvarjon / Age-Gender-Classification
View on GitHub
Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…
☆28Mar 5, 2024Updated 2 years ago
lifeiteng / OmniVAD-Kit
View on GitHub
Cross-platform VAD & Audio Event Detection toolkit — Python (PyPI) + TypeScript (npm) + C API. DFSMN models ~2MB, 200x real-time. Runs ev…
☆93Jul 14, 2026Updated 2 weeks ago
zhai-lw / L3AC
View on GitHub
A lightweight audio codec based on a single quantizer
☆36Sep 4, 2025Updated 10 months ago
interactiveaudiolab / emphases
View on GitHub
Crowdsourced and Automatic Speech Prominence Estimation
☆27Apr 12, 2024Updated 2 years ago
AI-S2-Lab / GPT-Talker
View on GitHub
[ACMMM'2024] Generative Expressive Conversational Speech Synthesis
☆45Oct 28, 2024Updated last year
pengzhendong / g2p-mix
View on GitHub
Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.
☆115Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
rishikksh20 / MiniMax-TTS-pytorch
View on GitHub
Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report
☆47Sep 2, 2025Updated 10 months ago
Mddct / usm-tokenizer
View on GitHub
semantic tokenizer for speech and music
☆20Jul 6, 2025Updated last year
dengcunqin / noise-reduction
View on GitHub
noise reduction
☆17Jul 3, 2024Updated 2 years ago
BiSinger-SVS / BiSinger
View on GitHub
Bilingual Singing Voice Synthesis
☆18Mar 25, 2024Updated 2 years ago
Audio-WestlakeU / CleanMel
View on GitHub
Pytorch implementation of "CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR".
☆94Feb 2, 2026Updated 5 months ago
caizexin / GenVC
View on GitHub
Self-supervised Generative LM-based Voice Conversion
☆58Apr 24, 2025Updated last year
backspacetg / distilXLSR
View on GitHub
Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model
☆13Mar 30, 2025Updated last year