Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
☆316Dec 22, 2025Updated 4 months ago
Alternatives and similar repositories for Whisper-Finetune
Users that are interested in Whisper-Finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆1,209Dec 17, 2025Updated 4 months ago
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆61Sep 5, 2025Updated 7 months ago
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated last year
- Text-to-text alignment algorithm for speech recognition error analysis.☆28Apr 6, 2026Updated 2 weeks ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆15,761Mar 17, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 5 months ago
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- Multilingual Voice Understanding Model☆8,005Dec 30, 2025Updated 3 months ago
- 基于 faster-whisper 的伪实时语音转写服务☆239Apr 29, 2025Updated 11 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆703Apr 17, 2026Updated last week
- Compute WER and SER for speech recognition evaluation☆27Mar 18, 2026Updated last month
- We Speech Transcript based on LLM, in 300 lines of code.☆185Jun 20, 2025Updated 10 months ago
- ☆843Jun 7, 2024Updated last year
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆20Mar 12, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Pseudo Streaming SenseVoice with Hotwords☆444Mar 13, 2025Updated last year
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 9 months ago
- 🔥 语音合成(TTS),语音克隆教程: https://dataxujing.github.io/TTS-paper/#/☆11Oct 29, 2024Updated last year
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆1,845Feb 25, 2026Updated last month
- Term Project at GTCMT exploring phase based features for Singing Voice Detection with Neural Networks☆11Apr 20, 2018Updated 8 years ago
- Python runtime for WeTextProcessing (does not depend on Pynini)☆49Nov 28, 2025Updated 4 months ago
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆218Jan 8, 2025Updated last year
- A simple implementation for improving CosyVoice2 by GRPO method☆37Oct 17, 2025Updated 6 months ago
- Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.☆70Mar 31, 2026Updated 3 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Faster Whisper transcription with CTranslate2☆22,361Nov 19, 2025Updated 5 months ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆242Dec 18, 2025Updated 4 months ago
- A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.☆230Apr 8, 2026Updated 2 weeks ago
- A tool for calculating WER (Word Error Rate) in python.☆14Sep 18, 2024Updated last year
- Onset-and-Offset-Aware Sound Event Detection☆22Feb 10, 2025Updated last year
- ☆559Jul 10, 2024Updated last year
- ☆11Dec 24, 2024Updated last year
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,891Jul 5, 2024Updated last year
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆486Nov 23, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆16Apr 8, 2025Updated last year
- Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.☆194Mar 20, 2026Updated last month
- superfast text to speech in any voice☆62Feb 16, 2026Updated 2 months ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,069Jan 8, 2025Updated last year
- Port of Funasr's Sense-voice model in C/C++☆542Dec 19, 2025Updated 4 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆320Mar 28, 2025Updated last year
- Text Normalization & Inverse Text Normalization☆752Feb 27, 2026Updated last month