coqui-ai / whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
β41Updated 6 months ago
Alternatives and similar repositories for whisperX:
Users that are interested in whisperX are comparing it to the libraries listed below
- On-device streaming text-to-speech engine powered by deep learningβ64Updated this week
- Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.β37Updated last month
- π π€ Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloningβ148Updated 6 months ago
- β90Updated 8 months ago
- An API for VoiceCraft.β26Updated 6 months ago
- API server for Instant voice cloning by MyShell.β80Updated 3 months ago
- β35Updated 2 years ago
- Mobile web app for audio "push-to-talk" + TTS chat interface with OpenAI-like APIsβ39Updated last year
- β69Updated 10 months ago
- 100% free, local & offline voice assistant with speech recognitionβ60Updated 3 months ago
- speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts withβ¦β174Updated 3 months ago
- A UI for the Piper TTSβ75Updated 4 months ago
- On-device speaker recognition engine powered by deep learningβ30Updated this week
- β61Updated 2 months ago
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelinesβ90Updated 8 months ago
- A very simple implementation of edge_tts w/ RVC for oobabooga text-generation-webui.β42Updated 11 months ago
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.β47Updated last month
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.β53Updated last month
- β56Updated 4 months ago
- Accepts a Hugging Face model URL, automatically downloads and quantizes it using Bits and Bytes.β38Updated 10 months ago
- β18Updated 2 years ago
- β23Updated last week
- Windows-compatible Fast API implementation of VoiceCraft, the Zero-Shot Speech Editing and Text-to-Speech in the Wildβ19Updated 8 months ago
- Faster Tortoise inference then Tortoise Fast Forkβ126Updated 8 months ago
- This repo provides a simple Gradio UI to run Qwen2 VL 72B AWQ in venv and have both image and video inferencing work.β26Updated 3 months ago
- Record audio and save a transcription to your system's clipboard with ctranslate2 and faster-whisper.β83Updated this week
- A web search extension for Oobabooga's text-generation-webui (now with nougat)β68Updated 6 months ago
- A QT GUI for large language modelsβ27Updated last year
- Self hosted high quality voice recognition for de-googled Android using whisper. Like Siri or OK Google.β57Updated last year