vasistalodagala/whisper-finetune

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vasistalodagala/whisper-finetune)

vasistalodagala / whisper-finetune

Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.

☆365

Alternatives and similar repositories for whisper-finetune

Users that are interested in whisper-finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jumon / whisper-finetuning
View on GitHub
[WIP] Scripts for fine-tuning Whisper
☆221Jul 2, 2026Updated 2 weeks ago
HKAB / whisper-finetune-vietnamese
View on GitHub
Whisper finetuned on VinBigdata-VLSP2020-100h + KenLM
☆38Oct 6, 2023Updated 2 years ago
yeyupiaoling / Whisper-Finetune
View on GitHub
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…
☆1,218May 8, 2026Updated 2 months ago
Srijith-rkr / Whispering-LLaMA
View on GitHub
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
☆271May 19, 2024Updated 2 years ago
vb100 / whisper_ai_finetune
View on GitHub
Fine-tune WhisperAI model to your language
☆21Sep 14, 2023Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Srijith-rkr / KAUST-Whisper-Adapter
View on GitHub
INTERSPEECH 23 - Refunction Whisper to recognize new tasks with adapters!
☆41Sep 11, 2023Updated 2 years ago
kjw11 / Speaker-Aware-CTC
View on GitHub
Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.
☆22May 26, 2025Updated last year
khanld / ASR-Wav2vec-Finetune
View on GitHub
Finetune Wa2vec 2.0 For Speech Recognition
☆149Feb 6, 2025Updated last year
nickjw0205 / Improving-ASR-with-LLM-Description
View on GitHub
☆20Sep 2, 2024Updated last year
fengredrum / finetune-whisper-lora
View on GitHub
Fine-Tune Whisper with Transformers and PEFT
☆58Nov 4, 2023Updated 2 years ago
huggingface / diarizers
View on GitHub
☆327Jun 14, 2024Updated 2 years ago
bayartsogt-ya / whisper-multiple-hf-datasets
View on GitHub
Whisper fine-tuning event script to use multiple hf datasets
☆32Dec 20, 2022Updated 3 years ago
voidful / wav2vec2-xlsr-multilingual-56
View on GitHub
56 language, 1 model Multilingual ASR
☆24Jul 25, 2021Updated 4 years ago
aiola-lab / whisper-medusa
View on GitHub
Whisper with Medusa heads
☆860Jul 2, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
linto-ai / whisper-timestamped
View on GitHub
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
☆2,827Sep 9, 2025Updated 10 months ago
ufal / whisper_streaming
View on GitHub
Whisper realtime streaming for long speech-to-text transcription and translation
☆3,652Nov 12, 2025Updated 8 months ago
pkufool / simple-wer
View on GitHub
A simple command line tool to calculate WER for ASR.
☆14Oct 14, 2024Updated last year
nyrahealth / CrisperWhisper
View on GitHub
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
☆973Updated this week
EtienneAb3d / WhisperHallu
View on GitHub
Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
☆350Nov 12, 2024Updated last year
google / speaker-id
View on GitHub
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at…
☆453Aug 12, 2025Updated 11 months ago
aalto-speech / interspeech2019_karhila_et_al
View on GitHub
Compendium for the paper "Transparent pronunciation scoring using articulatorily weighted phoneme edit distance" by Karhila, Smolander, Y…
☆25May 6, 2019Updated 7 years ago
yanghaha0908 / FastHuBERT
View on GitHub
Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
☆100Nov 20, 2024Updated last year
QwenLM / Qwen-Audio
View on GitHub
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
☆1,914Jul 5, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
m3hrdadfi / soxan
View on GitHub
Wav2Vec for speech recognition, classification, and audio classification
☆276Apr 2, 2022Updated 4 years ago
kyegomez / USM
View on GitHub
Implementation of Google's USM speech model in Pytorch
☆35Jul 13, 2026Updated last week
Sreyan88 / ReCLAP
View on GitHub
☆33Dec 23, 2025Updated 6 months ago
halsay / ASR-TTS-paper-daily
View on GitHub
Update ASR paper everyday
☆513May 16, 2026Updated 2 months ago
huggingface / dataspeech
View on GitHub
☆399Sep 3, 2024Updated last year
usc-sail / peft-ser
View on GitHub
[ACII 2023] PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Spe…
☆60Jul 1, 2024Updated 2 years ago
zruiii / QwenAudioSFT
View on GitHub
The repoduction codes for Qwen-Audio Fine-tuning
☆55Feb 28, 2026Updated 4 months ago
kingabzpro / WOLOF-ASR-Wav2Vec2
View on GitHub
Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.
☆18Nov 13, 2021Updated 4 years ago
gpu-poor / gramvaani_hindi_asr
View on GitHub
This repo contains the baseline model recipes and pre-trained model for GramVanni hindi ASR challenge
☆16Mar 26, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
bytedance / SALMONN
View on GitHub
SALMONN family: A suite of advanced multi-modal LLMs
☆1,477Updated this week
haoheliu / DCASE_2022_Task_5
View on GitHub
System that ranks 2nd in DCASE 2022 Challenge Task 5: Few-shot Bioacoustic Event Detection
☆28Jul 6, 2022Updated 4 years ago
audiodemo / voice-conversion
View on GitHub
Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks
☆17Aug 18, 2023Updated 2 years ago
primepake / dac_vae
View on GitHub
Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder
☆38Aug 30, 2025Updated 10 months ago
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
Mddct / simple-tts
View on GitHub
（WIP）long form speech generatoins
☆30Apr 2, 2025Updated last year
ASR-project / Multilingual-PR
View on GitHub
Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three differen…
☆266May 9, 2022Updated 4 years ago