Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
☆319Dec 22, 2025Updated 6 months ago
Alternatives and similar repositories for Whisper-Finetune
Users that are interested in Whisper-Finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆1,216May 8, 2026Updated last month
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,271Oct 16, 2024Updated last year
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆61Sep 5, 2025Updated 9 months ago
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated last year
- Text-to-text alignment algorithm for speech recognition error analysis.☆31Jun 23, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-…☆18,699Updated this week
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 7 months ago
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- Multilingual speech understanding: ASR + emotion recognition + audio event detection. 50+ languages, 15x faster than Whisper, non-autoreg…☆8,713Updated this week
- 基于 faster-whisper 的伪实时语音转写服务☆242Apr 29, 2025Updated last year
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆762Jun 11, 2026Updated 2 weeks ago
- Compute WER and SER for speech recognition evaluation☆26Jun 6, 2026Updated 3 weeks ago
- We Speech Transcript based on LLM, in 300 lines of code.☆185Jun 20, 2025Updated last year
- ☆854Jun 7, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆20Mar 12, 2026Updated 3 months ago
- Pseudo Streaming SenseVoice with Hotwords☆458Jun 15, 2026Updated 2 weeks ago
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 11 months ago
- 🔥 语音合成(TTS),语音克隆教程: https://dataxujing.github.io/TTS-paper/#/☆11Oct 29, 2024Updated last year
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆1,911Feb 25, 2026Updated 4 months ago
- Term Project at GTCMT exploring phase based features for Singing Voice Detection with Neural Networks☆11Apr 20, 2018Updated 8 years ago
- Python runtime for WeTextProcessing (does not depend on Pynini)☆52Jun 11, 2026Updated 2 weeks ago
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆222Jan 8, 2025Updated last year
- A simple implementation for improving CosyVoice2 by GRPO method☆38May 5, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.☆74Mar 31, 2026Updated 3 months ago
- Faster Whisper transcription with CTranslate2☆23,840Nov 19, 2025Updated 7 months ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆243Dec 18, 2025Updated 6 months ago
- A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.☆230Apr 8, 2026Updated 2 months ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- ☆561Jul 10, 2024Updated last year
- ☆11Dec 24, 2024Updated last year
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,908Jul 5, 2024Updated last year
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆496Nov 23, 2025Updated 7 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,086Jan 8, 2025Updated last year
- Port of Funasr's Sense-voice model in C/C++☆559Dec 19, 2025Updated 6 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆326Mar 28, 2025Updated last year
- A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization☆3,005Dec 8, 2025Updated 6 months ago
- Text Normalization & Inverse Text Normalization☆788Updated this week
- Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.☆254Mar 20, 2026Updated 3 months ago
- CTC decoder with hotwords for ASR.☆36Jun 15, 2026Updated 2 weeks ago