icynic / desktop-live-captionLinks
Transcribe desktop audio/computer audio in real-time and locally (Streaming ASR), using TorchAudio and Emformer-RNNT model for inference, PyAudio for reading stream, Tkinter for GUI.
☆14Updated last year
Alternatives and similar repositories for desktop-live-caption
Users that are interested in desktop-live-caption are comparing it to the libraries listed below
Sorting:
- Efficient approach to speaker diarization using voice characteristics extraction☆106Updated 7 months ago
- Synchronize Whisper's timestamps over an existing accurate transcription☆160Updated last year
- openvino version of openai/whisper☆182Updated 2 years ago
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelines☆100Updated last year
- Real-Time Whisper Voice Recognition with vosk model feedback.☆121Updated 2 years ago
- ☆55Updated 2 weeks ago
- 🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.☆258Updated last year
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆66Updated last year
- Running the F5-TTS by ONNX Runtime☆191Updated last month
- 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning☆161Updated last year
- Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.☆74Updated 6 months ago
- A high-throughput and memory-efficient inference and serving engine for Whisper, https://mesolitica.com/blog/vllm-whisper☆31Updated last year
- C++ library for converting text to phonemes for Piper☆138Updated 6 months ago
- Whisper combined with Silero VAD, for improved long-form transcriptions☆54Updated 3 years ago
- ☆100Updated last year
- ez audio transcription tool with flexible processing and post-processing options☆162Updated 2 years ago
- Fine Tune the Style-TTS2 Voice Model☆266Updated 7 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆219Updated 9 months ago
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.☆137Updated 2 years ago
- Timething is a library for aligning text transcripts with their audio recordings.☆128Updated last year
- ONNX-compatible Fast SeamlessM4T—Massively Multilingual & Multimodal Machine Translation☆43Updated 2 years ago
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.☆48Updated 4 months ago
- On-device voice activity detection (VAD) powered by deep learning☆243Updated 2 weeks ago
- ONNX Inference of Pyannote Segmentation☆97Updated last year
- Tool to make high quality text to speech (tts) corpus from audio + text books.☆28Updated 6 months ago
- Very fast, accurate speaker diarization☆223Updated last month
- On-device streaming text-to-speech engine powered by deep learning☆127Updated 2 weeks ago
- Open-source reproducible benchmarks from Argmax☆77Updated 2 weeks ago
- Official implementation of the TTS model Lina-Speech☆176Updated last year
- Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.☆120Updated 2 years ago