Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
☆1,200Dec 17, 2025Updated 2 months ago
Alternatives and similar repositories for Whisper-Finetune
Users that are interested in Whisper-Finetune are comparing it to the libraries listed below
Sorting:
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆313Dec 22, 2025Updated 2 months ago
- ☆558Jul 10, 2024Updated last year
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆15,036Feb 28, 2026Updated last week
- [WIP] Scripts for fine-tuning Whisper☆222May 29, 2023Updated 2 years ago
- Production First and Production Ready End-to-End Speech Recognition Toolkit☆5,044Dec 19, 2025Updated 2 months ago
- The repo provides information about KeSpeech dataset.☆172Oct 13, 2022Updated 3 years ago
- We Speech Transcript based on LLM, in 300 lines of code.☆185Jun 20, 2025Updated 8 months ago
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,879Jul 5, 2024Updated last year
- Multilingual Voice Understanding Model☆7,669Dec 30, 2025Updated 2 months ago
- Faster Whisper transcription with CTranslate2☆21,289Nov 19, 2025Updated 3 months ago
- chinese speech pretrained models☆1,192Aug 23, 2024Updated last year
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆1,788Feb 25, 2026Updated last week
- This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.☆50Jul 28, 2025Updated 7 months ago
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆995Jan 15, 2026Updated last month
- ☆1,369Mar 3, 2026Updated last week
- ☆837Jun 7, 2024Updated last year
- Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.☆361May 23, 2023Updated 2 years ago
- SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.☆539Mar 29, 2025Updated 11 months ago
- Text Normalization & Inverse Text Normalization☆727Feb 27, 2026Updated last week
- Silero VAD: pre-trained enterprise-grade Voice Activity Detector☆8,384Updated this week
- Production First and Production Ready End-to-End Text-to-Speech Toolkit☆416Nov 20, 2025Updated 3 months ago
- Chinese text normalization for speech processing☆722Mar 18, 2023Updated 2 years ago
- ☆86Jul 31, 2025Updated 7 months ago
- The dataset of Speech Recognition☆453Jan 4, 2026Updated 2 months ago
- SALMONN family: A suite of advanced multi-modal LLMs☆1,391Feb 3, 2026Updated last month
- Whisper realtime streaming for long speech-to-text transcription and translation☆3,546Nov 12, 2025Updated 3 months ago
- A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization☆2,811Dec 8, 2025Updated 3 months ago
- Pseudo Streaming SenseVoice with Hotwords☆434Mar 13, 2025Updated 11 months ago
- 📣 商用级开源语音自动识别程序库,开箱即用,全平台支持,中英文混合识别。A Cross-platform implementation of ASR inference. It's based on ONNXRuntime and FunASR. We provide …☆598May 15, 2024Updated last year
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,951Aug 14, 2025Updated 6 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆697Nov 27, 2025Updated 3 months ago
- KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-…☆526Dec 28, 2023Updated 2 years ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,049Jan 8, 2025Updated last year
- WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)☆20,556Feb 22, 2026Updated 2 weeks ago
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆269May 19, 2024Updated last year
- An evolving, large-scale and multi-domain ASR corpus for low-resource languages with automated crawling, transcription and refinement☆185Sep 1, 2025Updated 6 months ago
- 中文标点符号模型,可以给文本添加标点符号。☆147Dec 24, 2024Updated last year
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆649Jun 9, 2024Updated last year
- Best practice TTS based on BERT and VITS with some Natural Speech Features Of Microsoft; Support ONNX streaming out!☆1,227Feb 5, 2024Updated 2 years ago