Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
☆316Dec 22, 2025Updated 3 months ago
Alternatives and similar repositories for Whisper-Finetune
Users that are interested in Whisper-Finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆1,204Dec 17, 2025Updated 3 months ago
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,287Oct 16, 2024Updated last year
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆61Sep 5, 2025Updated 6 months ago
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated last year
- Text-to-text alignment algorithm for speech recognition error analysis.☆28Updated this week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆15,456Mar 17, 2026Updated 2 weeks ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 4 months ago
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- Multilingual Voice Understanding Model☆7,880Dec 30, 2025Updated 3 months ago
- 基于 faster-whisper 的伪实时语音转写服务☆239Apr 29, 2025Updated 11 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆700Mar 19, 2026Updated 2 weeks ago
- Compute WER and SER for speech recognition evaluation☆27Mar 18, 2026Updated 2 weeks ago
- We Speech Transcript based on LLM, in 300 lines of code.☆185Jun 20, 2025Updated 9 months ago
- ☆839Jun 7, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆20Mar 12, 2026Updated 3 weeks ago
- Pseudo Streaming SenseVoice with Hotwords☆442Mar 13, 2025Updated last year
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 8 months ago
- 🔥 语音合成(TTS),语音克隆教程: https://dataxujing.github.io/TTS-paper/#/☆11Oct 29, 2024Updated last year
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆1,825Feb 25, 2026Updated last month
- Python runtime for WeTextProcessing (does not depend on Pynini)☆49Nov 28, 2025Updated 4 months ago
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆218Jan 8, 2025Updated last year
- A simple implementation for improving CosyVoice2 by GRPO method☆35Oct 17, 2025Updated 5 months ago
- Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.☆67Updated this week
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.☆156Mar 20, 2026Updated 2 weeks ago
- Faster Whisper transcription with CTranslate2☆21,906Nov 19, 2025Updated 4 months ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆242Dec 18, 2025Updated 3 months ago
- superfast text to speech in any voice☆61Feb 16, 2026Updated last month
- A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.☆228Aug 6, 2025Updated 7 months ago
- A tool for calculating WER (Word Error Rate) in python.☆14Sep 18, 2024Updated last year
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- ☆560Jul 10, 2024Updated last year
- ☆11Dec 24, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,882Jul 5, 2024Updated last year
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆483Nov 23, 2025Updated 4 months ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,061Jan 8, 2025Updated last year
- Port of Funasr's Sense-voice model in C/C++☆538Dec 19, 2025Updated 3 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆319Mar 28, 2025Updated last year
- Text Normalization & Inverse Text Normalization☆737Feb 27, 2026Updated last month
- A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization☆2,863Dec 8, 2025Updated 3 months ago