Speech recognition & diarisation solution with text alignment, deployed in AML pipelines
☆102May 7, 2024Updated 2 years ago
Alternatives and similar repositories for asr-sd-pipeline
Users that are interested in asr-sd-pipeline are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ez audio transcription tool with flexible processing and post-processing options☆168Feb 1, 2024Updated 2 years ago
- web based editor for subtitles and transcripts☆147Aug 16, 2024Updated last year
- 🎯 Speech Recognition Challenge by Speech Lab - IIT Madras☆10Nov 5, 2020Updated 5 years ago
- WhisperX FastAPI integration☆18Mar 31, 2024Updated 2 years ago
- A CLI speech recognition tool, using OpenAI Whisper, supports audio file transcription and near-realtime microphone input.☆22Jun 6, 2026Updated last week
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Record audio or transcribe files using ctranslate2 and whisper!☆200Jun 12, 2026Updated last week
- ASR client for Triton ASR Service☆39Jan 12, 2026Updated 5 months ago
- Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper☆5,563Feb 23, 2026Updated 3 months ago
- Whisper command line client compatible with original OpenAI client based on CTranslate2.☆1,317Feb 14, 2026Updated 4 months ago
- A testing repo to share code and thoughts on diarisation☆58Mar 26, 2024Updated 2 years ago
- CLASP: Contrastive Language-Speech Pretraining for Multilingual Multimodal Information Retrieval☆13Jun 27, 2025Updated 11 months ago
- ☆22May 27, 2026Updated 3 weeks ago
- Code and pruned models for our paper: K. Gkrispanis, N. Gkalelis, V. Mezaris, "Filter-Pruning of Lightweight Face Detectors Using a Geome…☆14May 8, 2024Updated 2 years ago
- Automatically setup the AISHELL-4 and MSDWild dataset for usage with pyannote-database (and pyannote-audio)☆15Oct 22, 2025Updated 7 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts☆350Nov 12, 2024Updated last year
- Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens☆546Nov 6, 2023Updated 2 years ago
- Incredibly descriptive audiovisual summaries for videos☆41Aug 2, 2024Updated last year
- Whisper realtime streaming for long speech-to-text transcription and translation☆3,640Nov 12, 2025Updated 7 months ago
- jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2☆19Aug 15, 2025Updated 10 months ago
- Transcribe and translate voice into LRC file using Whisper and LLMs (GPT, Claude, et,al). 使用whisper和LLM(GPT,Claude等)来转录、翻译你的音频为字幕文件。☆661May 25, 2026Updated 3 weeks ago
- Unconstrained Text Detection with Box Supervisionand Dynamic Self-Training☆34Nov 24, 2022Updated 3 years ago
- A streaming whisper server for on-prem transcription☆23Aug 15, 2024Updated last year
- ☆32Feb 4, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Open source cross-platform implementation of MRCP protocol☆20Mar 3, 2022Updated 4 years ago
- A enterprise-grade Chinese-English code switch punctuator from funasr.☆33Apr 26, 2024Updated 2 years ago
- A python package to build AI-powered real-time audio applications☆1,987Feb 12, 2025Updated last year
- ☆15Jul 11, 2022Updated 3 years ago
- ☆67Apr 3, 2023Updated 3 years ago
- Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation☆18May 17, 2023Updated 3 years ago
- VoxSRC2022 workshop development kit☆19Jul 21, 2022Updated 3 years ago
- Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection☆957Jun 3, 2025Updated last year
- Github repository for ACL 2025 paper: VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models☆24Jun 16, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Speaker change detection using SincNet and an LSTM/Transformer☆58May 26, 2025Updated last year
- Faster Whisper transcription with CTranslate2☆23,584Nov 19, 2025Updated 7 months ago
- Multilingual Automatic Speech Recognition with word-level timestamps and confidence☆2,818Sep 9, 2025Updated 9 months ago
- WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)☆22,462Jun 3, 2026Updated 2 weeks ago
- 达摩fsmn vad c++推理服务☆18Apr 17, 2023Updated 3 years ago
- ☆24Oct 27, 2021Updated 4 years ago
- streaming speech to text server using Whisper☆102Jun 2, 2023Updated 3 years ago