Vaibhavs10 / fast-whisper-finetuning
☆459Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for fast-whisper-finetuning
- Fine-tune and evaluate Whisper models for Automatic Speech Recognition (ASR) on custom datasets or datasets from huggingface.☆259Updated last year
- ☆347Updated 8 months ago
- ☆256Updated 5 months ago
- Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event …☆322Updated 9 months ago
- Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts☆275Updated last week
- Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens☆444Updated last year
- ☆307Updated 2 months ago
- An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine☆312Updated 2 months ago
- NeMo text processing for ASR and TTS☆284Updated this week
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆232Updated 6 months ago
- A python package to build AI-powered real-time audio applications☆1,090Updated 4 months ago
- Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch☆614Updated last month
- Open source inference code for Rev's model☆333Updated this week
- Improving transcription performance of OpenAI Whisper for CPU based deployment☆237Updated 2 years ago
- Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection☆262Updated 2 months ago
- Whisper realtime streaming for long speech-to-text transcription and translation☆103Updated 9 months ago
- Pybind11 bindings for Whisper.cpp☆325Updated this week
- Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis☆827Updated 3 months ago
- Text to speech alignment using CTC forced alignment☆137Updated 3 weeks ago
- Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.☆567Updated last year
- ☆253Updated 8 months ago
- 💬 ASR FastAPI server using faster-whisper and Multi-Scale Auto-Tuning Spectral Clustering for diarization.☆196Updated 3 weeks ago
- Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit☆729Updated last week
- ☆519Updated 6 months ago
- Finetune VITS and MMS using HuggingFace's tools☆122Updated 7 months ago
- unofficial vits2-TTS implementation in pytorch☆488Updated 7 months ago
- HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools☆432Updated last year
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆438Updated this week
- VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design☆497Updated last year
- A live speech recognition using Facebooks wav2vec 2.0 model.☆328Updated 9 months ago