egorsmkv / optimized-whisper
Use quantized versions of Whisper to speed up inference
☆11Updated last month
Related projects ⓘ
Alternatives and complementary repositories for optimized-whisper
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated 9 months ago
- Collection of scripts from mHuBERT-147.☆22Updated this week
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆45Updated 2 weeks ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆83Updated last month
- Speech-MASSIVE is a multilingual Spoken Language Understanding (SLU) dataset comprising the speech counterpart for a portion of the MASSI…☆19Updated 2 months ago
- GPT-style network for phonemization with durations of text☆62Updated 8 months ago
- Normalize Text in Russian☆24Updated last year
- Unofficial implementation of wavenext vocoder☆32Updated 2 months ago
- Supervoice diffusion enhance☆24Updated 4 months ago
- Простой IPA фонемизатор на базе ruaccent-encoder☆14Updated last month
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆71Updated last year
- ☆13Updated last week
- ☆13Updated 2 months ago
- The TTSDS benchmark evaluates synthetic speech quality by considering prosody, speaker identity, and intelligibility, comparing these fac…☆19Updated last week
- A toolkit to calculate speech audio quality. Not affiliated with the original authors☆39Updated 3 months ago
- ☆57Updated 2 months ago
- Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings…☆46Updated 5 months ago
- ☆26Updated 8 months ago
- Zero-Shot Emotion Style Transfer☆37Updated 7 months ago
- ☆17Updated 3 months ago
- VoiceBox neural network implementation☆96Updated 3 months ago
- The implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".☆18Updated 2 months ago
- Joint speech-language model - respond directly to audio!☆30Updated 6 months ago
- Official implementation of paper: Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-…☆21Updated 2 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆66Updated last week
- VALL-E 2 reproduction☆87Updated 4 months ago
- ☆54Updated this week
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆50Updated last year
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆42Updated 2 months ago
- Just another FastSpeech 2 but cleaner code :)☆25Updated 4 months ago