Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
☆319Dec 22, 2025Updated 5 months ago
Alternatives and similar repositories for Whisper-Finetune
Users that are interested in Whisper-Finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆1,216May 8, 2026Updated last month
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,275Oct 16, 2024Updated last year
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆61Sep 5, 2025Updated 9 months ago
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated last year
- Text-to-text alignment algorithm for speech recognition error analysis.☆30Apr 6, 2026Updated 2 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-…☆17,657Updated this week
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 7 months ago
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoreg…☆8,497Updated this week
- 基于 faster-whisper 的伪实时语音转写服务☆242Apr 29, 2025Updated last year
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆755May 14, 2026Updated 3 weeks ago
- Compute WER and SER for speech recognition evaluation☆26Mar 18, 2026Updated 2 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆184Jun 20, 2025Updated 11 months ago
- ☆850Jun 7, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆20Mar 12, 2026Updated 2 months ago
- Pseudo Streaming SenseVoice with Hotwords☆455Mar 13, 2025Updated last year
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 10 months ago
- 🔥 语音合成(TTS),语音克隆教程: https://dataxujing.github.io/TTS-paper/#/☆11Oct 29, 2024Updated last year
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆1,899Feb 25, 2026Updated 3 months ago
- Term Project at GTCMT exploring phase based features for Singing Voice Detection with Neural Networks☆11Apr 20, 2018Updated 8 years ago
- Python runtime for WeTextProcessing (does not depend on Pynini)☆52Nov 28, 2025Updated 6 months ago
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆220Jan 8, 2025Updated last year
- A simple implementation for improving CosyVoice2 by GRPO method☆38May 5, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.☆73Mar 31, 2026Updated 2 months ago
- Faster Whisper transcription with CTranslate2☆23,408Nov 19, 2025Updated 6 months ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆242Dec 18, 2025Updated 5 months ago
- A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.☆230Apr 8, 2026Updated 2 months ago
- A tool for calculating WER (Word Error Rate) in python.☆14Sep 18, 2024Updated last year
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- ☆560Jul 10, 2024Updated last year
- ☆11Dec 24, 2024Updated last year
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,899Jul 5, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆495Nov 23, 2025Updated 6 months ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,081Jan 8, 2025Updated last year
- Port of Funasr's Sense-voice model in C/C++☆551Dec 19, 2025Updated 5 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆324Mar 28, 2025Updated last year
- A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization☆2,970Dec 8, 2025Updated 6 months ago
- Text Normalization & Inverse Text Normalization☆780Updated this week
- Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.☆241Mar 20, 2026Updated 2 months ago