shuaijiang / Whisper-FinetuneView external linksLinks
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
☆312Dec 22, 2025Updated last month
Alternatives and similar repositories for Whisper-Finetune
Users that are interested in Whisper-Finetune are comparing it to the libraries listed below
Sorting:
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆1,189Dec 17, 2025Updated 2 months ago
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,281Oct 16, 2024Updated last year
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆62Sep 5, 2025Updated 5 months ago
- We Speech Transcript based on LLM, in 300 lines of code.☆183Jun 20, 2025Updated 7 months ago
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated 10 months ago
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 3 months ago
- Multilingual Voice Understanding Model☆7,497Dec 30, 2025Updated last month
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆20Feb 10, 2026Updated last week
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆14,891Feb 4, 2026Updated last week
- ☆832Jun 7, 2024Updated last year
- Text-To-Speech for NotebookLM☆37Jul 20, 2025Updated 6 months ago
- A simple implementation for improving CosyVoice2 by GRPO method☆32Oct 17, 2025Updated 4 months ago
- MooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction mode…☆218Jan 8, 2025Updated last year
- Pseudo Streaming SenseVoice with Hotwords☆429Mar 13, 2025Updated 11 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆696Nov 27, 2025Updated 2 months ago
- Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.☆62Sep 18, 2025Updated 4 months ago
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆1,766Updated this week
- Compute WER and SER for speech recognition evaluation☆26Dec 15, 2025Updated 2 months ago
- auto push daily news with ai☆13Updated this week
- 基于 faster-whisper 的伪实时语音转写服务☆238Apr 29, 2025Updated 9 months ago
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆239Dec 18, 2025Updated last month
- ☆97Oct 16, 2025Updated 4 months ago
- A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.☆224Aug 6, 2025Updated 6 months ago
- Faster Whisper transcription with CTranslate2☆20,951Nov 19, 2025Updated 2 months ago
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- Python runtime for WeTextProcessing (does not depend on Pynini)☆48Nov 28, 2025Updated 2 months ago
- ☆39Sep 25, 2025Updated 4 months ago
- The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,874Jul 5, 2024Updated last year
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆111Dec 20, 2024Updated last year
- Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…☆22May 9, 2025Updated 9 months ago
- FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.☆242Nov 11, 2025Updated 3 months ago
- A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization☆2,778Dec 8, 2025Updated 2 months ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆314Mar 28, 2025Updated 10 months ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,039Jan 8, 2025Updated last year
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆85Jan 9, 2024Updated 2 years ago
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆94Oct 8, 2025Updated 4 months ago
- ☆69Jul 17, 2024Updated last year
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆480Nov 23, 2025Updated 2 months ago
- ☆558Jul 10, 2024Updated last year