Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
☆319Dec 22, 2025Updated 4 months ago
Alternatives and similar repositories for Whisper-Finetune
Users that are interested in Whisper-Finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆1,211Updated this week
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,279Oct 16, 2024Updated last year
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆61Sep 5, 2025Updated 8 months ago
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated last year
- Text-to-text alignment algorithm for speech recognition error analysis.☆29Apr 6, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆16,012Mar 17, 2026Updated last month
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 6 months ago
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- Multilingual Voice Understanding Model☆8,125Dec 30, 2025Updated 4 months ago
- 基于 faster-whisper 的伪实时语音转写服务☆241Apr 29, 2025Updated last year
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆706Updated this week
- Compute WER and SER for speech recognition evaluation☆27Mar 18, 2026Updated last month
- We Speech Transcript based on LLM, in 300 lines of code.☆185Jun 20, 2025Updated 10 months ago
- ☆847Jun 7, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆20Mar 12, 2026Updated 2 months ago
- Pseudo Streaming SenseVoice with Hotwords☆448Mar 13, 2025Updated last year
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 9 months ago
- 🔥 语音合成(TTS),语音克隆教程: https://dataxujing.github.io/TTS-paper/#/☆11Oct 29, 2024Updated last year
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆1,865Feb 25, 2026Updated 2 months ago
- Python runtime for WeTextProcessing (does not depend on Pynini)☆50Nov 28, 2025Updated 5 months ago
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆218Jan 8, 2025Updated last year
- A simple implementation for improving CosyVoice2 by GRPO method☆38May 5, 2026Updated last week
- Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.☆71Mar 31, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Faster Whisper transcription with CTranslate2☆22,691Nov 19, 2025Updated 5 months ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆242Dec 18, 2025Updated 4 months ago
- A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.☆230Apr 8, 2026Updated last month
- A tool for calculating WER (Word Error Rate) in python.☆14Sep 18, 2024Updated last year
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- ☆559Jul 10, 2024Updated last year
- ☆11Dec 24, 2024Updated last year
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,897Jul 5, 2024Updated last year
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆491Nov 23, 2025Updated 5 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,081Jan 8, 2025Updated last year
- Port of Funasr's Sense-voice model in C/C++☆545Dec 19, 2025Updated 4 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆323Mar 28, 2025Updated last year
- A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization☆2,930Dec 8, 2025Updated 5 months ago
- Text Normalization & Inverse Text Normalization☆760Feb 27, 2026Updated 2 months ago
- CTC decoder with hotwords for ASR.☆35Apr 13, 2025Updated last year
- Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.☆211Mar 20, 2026Updated last month