yeyupiaoling/Whisper-Finetune

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yeyupiaoling/Whisper-Finetune)

yeyupiaoling / Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment

☆1,218

Alternatives and similar repositories for Whisper-Finetune

Users that are interested in Whisper-Finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shuaijiang / Whisper-Finetune
View on GitHub
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…
☆318Dec 22, 2025Updated 7 months ago
Vaibhavs10 / fast-whisper-finetuning
View on GitHub
☆562Jul 10, 2024Updated 2 years ago
jumon / whisper-finetuning
View on GitHub
[WIP] Scripts for fine-tuning Whisper
☆221Jul 2, 2026Updated 2 weeks ago
wenet-e2e / wenet
View on GitHub
Production First and Production Ready End-to-End Speech Recognition Toolkit
☆5,175Jun 15, 2026Updated last month
vasistalodagala / whisper-finetune
View on GitHub
Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.
☆365May 23, 2023Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tzyll / KeSpeech
View on GitHub
The repo provides information about KeSpeech Mandarin dialect dataset.
☆183Oct 13, 2022Updated 3 years ago
wenet-e2e / wesr
View on GitHub
We Speech Transcript based on LLM, in 300 lines of code.
☆182Jun 20, 2025Updated last year
teamtee / Qwen2-Audio-finetune
View on GitHub
This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.
☆50Jul 28, 2025Updated 11 months ago
TencentGameMate / chinese_speech_pretrain
View on GitHub
chinese speech pretrained models
☆1,209Aug 23, 2024Updated last year
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,937Feb 25, 2026Updated 4 months ago
modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,387Updated this week
k2-fsa / icefall
View on GitHub
☆1,456Updated this week
QwenLM / Qwen-Audio
View on GitHub
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
☆1,914Jul 5, 2024Updated 2 years ago
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,048Jan 15, 2026Updated 6 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
FunAudioLLM / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,911Updated this week
SpeechColab / Leaderboard
View on GitHub
SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.
☆547Mar 29, 2025Updated last year
SYSTRAN / faster-whisper
View on GitHub
Faster Whisper transcription with CTranslate2
☆24,424Nov 19, 2025Updated 8 months ago
BriansIDP / WhisperBiasing
View on GitHub
☆88Jul 31, 2025Updated 11 months ago
bytedance / SALMONN
View on GitHub
SALMONN family: A suite of advanced multi-modal LLMs
☆1,477Updated this week
speechio / chinese_text_normalization
View on GitHub
Chinese text normalization for speech processing
☆734Mar 18, 2023Updated 3 years ago
wenet-e2e / WeTextProcessing
View on GitHub
Text Normalization & Inverse Text Normalization
☆802Jun 26, 2026Updated 3 weeks ago
halsay / ASR-TTS-paper-daily
View on GitHub
Update ASR paper everyday
☆513May 16, 2026Updated 2 months ago
DataoceanAI / Dolphin
View on GitHub
Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.
☆772Jun 11, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Tele-AI / TeleSpeech-ASR
View on GitHub
☆855Jun 7, 2024Updated 2 years ago
modelscope / 3D-Speaker
View on GitHub
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
☆3,060Dec 8, 2025Updated 7 months ago
double22a / speech_dataset
View on GitHub
The dataset of Speech Recognition
☆464Jan 4, 2026Updated 6 months ago
wenet-e2e / wetts
View on GitHub
Production First and Production Ready End-to-End Text-to-Speech Toolkit
☆416Nov 20, 2025Updated 8 months ago
snakers4 / silero-vad
View on GitHub
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
☆9,636Updated this week
ufal / whisper_streaming
View on GitHub
Whisper realtime streaming for long speech-to-text transcription and translation
☆3,652Nov 12, 2025Updated 8 months ago
yeyupiaoling / PunctuationModel
View on GitHub
中文标点符号模型，可以给文本添加标点符号。
☆145Dec 24, 2024Updated last year
pengzhendong / streaming-sensevoice
View on GitHub
Pseudo Streaming SenseVoice with Hotwords
☆466Jun 15, 2026Updated last month
k2-fsa / k2
View on GitHub
FSA/FST algorithms, differentiable, with PyTorch compatibility.
☆1,348Jul 11, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
modelscope / ClearerVoice-Studio
View on GitHub
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…
☆4,316Aug 14, 2025Updated 11 months ago
FireRedTeam / FireRedASR2S
View on GitHub
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/…
☆609Jun 2, 2026Updated last month
espnet / espnet
View on GitHub
End-to-End Speech Processing Toolkit
☆9,898Updated this week
stevenhillis / awesome-asr-contextualization
View on GitHub
A curated list of awesome papers on contextualizing E2E ASR outputs
☆81May 10, 2023Updated 3 years ago
huggingface / distil-whisper
View on GitHub
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
☆4,091Jan 8, 2025Updated last year
VITA-MLLM / Freeze-Omni
View on GitHub
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
☆388May 27, 2025Updated last year
Slyne / ctc_decoder
View on GitHub
A ctc decoder for both online and offline asr model
☆66Nov 18, 2023Updated 2 years ago