Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
☆1,216May 8, 2026Updated last month
Alternatives and similar repositories for Whisper-Finetune
Users that are interested in Whisper-Finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆319Dec 22, 2025Updated 5 months ago
- ☆560Jul 10, 2024Updated last year
- [WIP] Scripts for fine-tuning Whisper☆221May 29, 2023Updated 3 years ago
- Production First and Production Ready End-to-End Speech Recognition Toolkit☆5,131May 11, 2026Updated 3 weeks ago
- Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-…☆17,657Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.☆363May 23, 2023Updated 3 years ago
- The repo provides information about KeSpeech dataset.☆178Oct 13, 2022Updated 3 years ago
- This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.☆51Jul 28, 2025Updated 10 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆184Jun 20, 2025Updated 11 months ago
- chinese speech pretrained models☆1,203Aug 23, 2024Updated last year
- ☆1,420Jun 1, 2026Updated last week
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,899Jul 5, 2024Updated last year
- Faster Whisper transcription with CTranslate2☆23,408Nov 19, 2025Updated 6 months ago
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆1,035Jan 15, 2026Updated 4 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆1,899Feb 25, 2026Updated 3 months ago
- Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoreg…☆8,497Updated this week
- SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.☆546Mar 29, 2025Updated last year
- ☆88Jul 31, 2025Updated 10 months ago
- SALMONN family: A suite of advanced multi-modal LLMs☆1,443May 26, 2026Updated 2 weeks ago
- Chinese text normalization for speech processing☆731Mar 18, 2023Updated 3 years ago
- Text Normalization & Inverse Text Normalization☆780Updated this week
- Silero VAD: pre-trained enterprise-grade Voice Activity Detector☆9,252Mar 26, 2026Updated 2 months ago
- ☆850Jun 7, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- The dataset of Speech Recognition☆459Jan 4, 2026Updated 5 months ago
- Production First and Production Ready End-to-End Text-to-Speech Toolkit☆416Nov 20, 2025Updated 6 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆755May 14, 2026Updated 3 weeks ago
- Whisper realtime streaming for long speech-to-text transcription and translation☆3,633Nov 12, 2025Updated 6 months ago
- A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization☆2,970Dec 8, 2025Updated 6 months ago
- 中文标点符号模型,可以给文本添加标点符号。☆146Dec 24, 2024Updated last year
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆4,214Aug 14, 2025Updated 9 months ago
- FSA/FST algorithms, differentiable, with PyTorch compatibility.☆1,342May 20, 2026Updated 3 weeks ago
- End-to-End Speech Processing Toolkit☆9,855Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Pseudo Streaming SenseVoice with Hotwords☆455Mar 13, 2025Updated last year
- A curated list of awesome papers on contextualizing E2E ASR outputs☆81May 10, 2023Updated 3 years ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,081Jan 8, 2025Updated last year
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆381May 27, 2025Updated last year
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆271May 19, 2024Updated 2 years ago
- WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)☆22,308Jun 3, 2026Updated last week
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆193Apr 28, 2026Updated last month