shuaijiang/Whisper-Finetune

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shuaijiang/Whisper-Finetune)

shuaijiang / Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment

☆318

Alternatives and similar repositories for Whisper-Finetune

Users that are interested in Whisper-Finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yeyupiaoling / Whisper-Finetune
View on GitHub
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…
☆1,218May 8, 2026Updated 2 months ago
LianjiaTech / BELLE
View on GitHub
BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）
☆8,273Oct 16, 2024Updated last year
Mddct / simple-tts
View on GitHub
（WIP）long form speech generatoins
☆30Apr 2, 2025Updated last year
pengzhendong / torchfa
View on GitHub
Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.
☆61Sep 5, 2025Updated 10 months ago
DataoceanAI / Dolphin
View on GitHub
Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.
☆772Jun 11, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ultrasev / stream-whisper
View on GitHub
基于 faster-whisper 的伪实时语音转写服务
☆241Apr 29, 2025Updated last year
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,937Feb 25, 2026Updated 4 months ago
FunAudioLLM / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,911Updated this week
modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,387Updated this week
Tele-AI / TeleSpeech-ASR
View on GitHub
☆855Jun 7, 2024Updated 2 years ago
wenet-e2e / wesr
View on GitHub
We Speech Transcript based on LLM, in 300 lines of code.
☆182Jun 20, 2025Updated last year
corticph / error-align
View on GitHub
Text-to-text alignment algorithm for speech recognition error analysis.
☆31Jun 23, 2026Updated 3 weeks ago
hexisyztem / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆20Mar 12, 2026Updated 4 months ago
zhu-han / SpeechLLM
View on GitHub
LLM-based ASR recipe with Zipformer encoder and Qwen LLM
☆34Sep 25, 2025Updated 9 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
pengzhendong / streaming-sensevoice
View on GitHub
Pseudo Streaming SenseVoice with Hotwords
☆466Jun 15, 2026Updated last month
lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated last year
DataXujing / TTS-paper
View on GitHub
🔥 语音合成（TTS）,语音克隆教程: https://dataxujing.github.io/TTS-paper/#/
☆11Oct 29, 2024Updated last year
pengzhendong / audiolab
View on GitHub
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
☆39Mar 31, 2026Updated 3 months ago
MooreThreads / MooER
View on GitHub
MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…
☆219Jan 8, 2025Updated last year
pengzhendong / compute-wer
View on GitHub
Compute WER and SER for speech recognition evaluation
☆27Jun 6, 2026Updated last month
XiaomiMiMo / MiMo-Audio-Training
View on GitHub
☆109Oct 16, 2025Updated 9 months ago
pengzhendong / speaker-diarization
View on GitHub
Offline Speaker Diarization with SenseVoice by Sherpa ONNX.
☆15Dec 23, 2024Updated last year
lovemefan / SenseVoice.cpp
View on GitHub
Port of Funasr's Sense-voice model in C/C++
☆568Dec 19, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
jonflynng / qwen2-audio-finetune
View on GitHub
Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.
☆24Nov 23, 2024Updated last year
xingchensong / TouchNet
View on GitHub
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
☆232Jul 2, 2026Updated 2 weeks ago
Vaibhavs10 / fast-whisper-finetuning
View on GitHub
☆562Jul 10, 2024Updated 2 years ago
QwenLM / Qwen-Audio
View on GitHub
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
☆1,914Jul 5, 2024Updated 2 years ago
LqNoob / Neural-Codec-and-Speech-Language-Models
View on GitHub
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
☆246Jul 9, 2026Updated last week
nkilm / offline-whisperx
View on GitHub
Run different pipelines of WhisperX - Transcription, Diarization, VAD, Alignment completely OFFLINE.
☆48Mar 30, 2025Updated last year
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Updated this week
ASLP-lab / OSUM
View on GitHub
OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.
☆494Nov 23, 2025Updated 7 months ago
ASLP-lab / WenetSpeech-Chuan
View on GitHub
Official repository for the WenetSpeech-Chuan dataset.
☆218Jul 14, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
huggingface / distil-whisper
View on GitHub
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
☆4,091Jan 8, 2025Updated last year
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆1,016Dec 2, 2025Updated 7 months ago
SYSTRAN / faster-whisper
View on GitHub
Faster Whisper transcription with CTranslate2
☆24,424Nov 19, 2025Updated 8 months ago
xiaomi-research / r1-aqa
View on GitHub
🤗 R1-AQA Model: mispeech/r1-aqa
☆325Mar 28, 2025Updated last year
tomer9080 / WhisperRT-Streaming
View on GitHub
Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.
☆75Mar 31, 2026Updated 3 months ago
wenet-e2e / WeTextProcessing
View on GitHub
Text Normalization & Inverse Text Normalization
☆802Jun 26, 2026Updated 3 weeks ago
xingchensong / FlashCosyVoice
View on GitHub
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
☆250Feb 25, 2026Updated 4 months ago