vasistalodagala / whisper-finetuneView external linksLinks
Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.
☆360May 23, 2023Updated 2 years ago
Alternatives and similar repositories for whisper-finetune
Users that are interested in whisper-finetune are comparing it to the libraries listed below
Sorting:
- ☆558Jul 10, 2024Updated last year
- [WIP] Scripts for fine-tuning Whisper☆222May 29, 2023Updated 2 years ago
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆267May 19, 2024Updated last year
- Whisper finetuned on VinBigdata-VLSP2020-100h + KenLM☆37Oct 6, 2023Updated 2 years ago
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆21May 26, 2025Updated 8 months ago
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆1,192Dec 17, 2025Updated 2 months ago
- INTERSPEECH 23 - Refunction Whisper to recognize new tasks with adapters!☆43Sep 11, 2023Updated 2 years ago
- Whisper fine-tuning event script to use multiple hf datasets☆32Dec 20, 2022Updated 3 years ago
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- ☆20Sep 2, 2024Updated last year
- Finetune Wa2vec 2.0 For Speech Recognition☆145Feb 6, 2025Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆113Jan 28, 2026Updated 2 weeks ago
- Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning☆96Nov 20, 2024Updated last year
- LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme…☆22Aug 14, 2025Updated 6 months ago
- Multilingual Automatic Speech Recognition with word-level timestamps and confidence☆2,759Sep 9, 2025Updated 5 months ago
- Fine-Tune Whisper with Transformers and PEFT☆58Nov 4, 2023Updated 2 years ago
- [ACII 2023] PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Spe…☆60Jul 1, 2024Updated last year
- The repoduction codes for Qwen-Audio Fine-tuning☆53Aug 15, 2024Updated last year
- Source code and demo for INTERPSEECH 2023 paper: DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion P…☆37Dec 5, 2023Updated 2 years ago
- Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks☆17Aug 18, 2023Updated 2 years ago
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 5 months ago
- System that ranks 2nd in DCASE 2022 Challenge Task 5: Few-shot Bioacoustic Event Detection☆28Jul 6, 2022Updated 3 years ago
- 56 language, 1 model Multilingual ASR☆24Jul 25, 2021Updated 4 years ago
- Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts☆348Nov 12, 2024Updated last year
- Whisper realtime streaming for long speech-to-text transcription and translation☆3,530Nov 12, 2025Updated 3 months ago
- ☆323Jun 14, 2024Updated last year
- ☆389Sep 3, 2024Updated last year
- [APSIPA'22] Exploring Speaker Age Estimation on Different Self-Supervised Learning Models☆14Oct 19, 2022Updated 3 years ago
- Unofficial implementation of ConvNeXt-TTS powered by lightning☆18Oct 20, 2024Updated last year
- Code for the winning solution in the SE&R 2022 Challenge - SER track.☆16Mar 28, 2023Updated 2 years ago
- Compendium for the paper "Transparent pronunciation scoring using articulatorily weighted phoneme edit distance" by Karhila, Smolander, Y…☆25May 6, 2019Updated 6 years ago
- ☆32Dec 23, 2025Updated last month
- This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at…☆439Aug 12, 2025Updated 6 months ago
- Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection☆895Jun 3, 2025Updated 8 months ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Jun 14, 2024Updated last year
- ☆32Jan 6, 2022Updated 4 years ago
- PodcastMix A dataset for separating music and speech in podcasts.☆44Aug 20, 2024Updated last year
- Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…☆77Jul 16, 2023Updated 2 years ago
- The open source code for LLM-Codec☆145Aug 18, 2024Updated last year