Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
☆1,204Dec 17, 2025Updated 3 months ago
Alternatives and similar repositories for Whisper-Finetune
Users that are interested in Whisper-Finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆316Dec 22, 2025Updated 3 months ago
- ☆560Jul 10, 2024Updated last year
- [WIP] Scripts for fine-tuning Whisper☆221May 29, 2023Updated 2 years ago
- Production First and Production Ready End-to-End Speech Recognition Toolkit☆5,065Updated this week
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆15,456Mar 17, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.☆360May 23, 2023Updated 2 years ago
- The repo provides information about KeSpeech dataset.☆174Oct 13, 2022Updated 3 years ago
- This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.☆51Jul 28, 2025Updated 8 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆185Jun 20, 2025Updated 9 months ago
- chinese speech pretrained models☆1,196Aug 23, 2024Updated last year
- ☆1,383Mar 25, 2026Updated last week
- Faster Whisper transcription with CTranslate2☆21,906Nov 19, 2025Updated 4 months ago
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,882Jul 5, 2024Updated last year
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆1,012Jan 15, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆1,825Feb 25, 2026Updated last month
- Multilingual Voice Understanding Model☆7,880Dec 30, 2025Updated 3 months ago
- SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.☆542Mar 29, 2025Updated last year
- ☆86Jul 31, 2025Updated 8 months ago
- SALMONN family: A suite of advanced multi-modal LLMs☆1,397Feb 3, 2026Updated 2 months ago
- Silero VAD: pre-trained enterprise-grade Voice Activity Detector☆8,643Mar 26, 2026Updated last week
- ☆839Jun 7, 2024Updated last year
- The dataset of Speech Recognition☆456Jan 4, 2026Updated 2 months ago
- Chinese text normalization for speech processing☆724Mar 18, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Text Normalization & Inverse Text Normalization☆737Feb 27, 2026Updated last month
- Production First and Production Ready End-to-End Text-to-Speech Toolkit☆417Nov 20, 2025Updated 4 months ago
- Whisper realtime streaming for long speech-to-text transcription and translation☆3,583Nov 12, 2025Updated 4 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆700Mar 19, 2026Updated 2 weeks ago
- A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization☆2,863Dec 8, 2025Updated 3 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆4,011Aug 14, 2025Updated 7 months ago
- 中文标点符号模型,可以给文本添加标点符号。☆146Dec 24, 2024Updated last year
- FSA/FST algorithms, differentiable, with PyTorch compatibility.☆1,320Mar 9, 2026Updated 3 weeks ago
- End-to-End Speech Processing Toolkit☆9,792Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Pseudo Streaming SenseVoice with Hotwords☆442Mar 13, 2025Updated last year
- A curated list of awesome papers on contextualizing E2E ASR outputs☆80May 10, 2023Updated 2 years ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,061Jan 8, 2025Updated last year
- WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)☆20,952Mar 25, 2026Updated last week
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆270May 19, 2024Updated last year
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆187Sep 1, 2025Updated 7 months ago
- A ctc decoder for both online and offline asr model☆66Nov 18, 2023Updated 2 years ago