icynic / desktop-live-captionLinks
Transcribe desktop audio/computer audio in real-time and locally (Streaming ASR), using TorchAudio and Emformer-RNNT model for inference, PyAudio for reading stream, Tkinter for GUI.
โ14Updated last year
Alternatives and similar repositories for desktop-live-caption
Users that are interested in desktop-live-caption are comparing it to the libraries listed below
Sorting:
- ๐ ๐ค Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloningโ161Updated last year
- ๐ Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. ๐ง๐ฅ๐ Advanced audio processing.โ258Updated last year
- Fine Tune the Style-TTS2 Voice Modelโ266Updated 7 months ago
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.โ48Updated 4 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"โ66Updated last year
- โ55Updated 3 weeks ago
- โ297Updated 6 months ago
- openvino version of openai/whisperโ182Updated 2 years ago
- SoTA open-source TTSโ135Updated 8 months ago
- Efficient approach to speaker diarization using voice characteristics extractionโ106Updated 7 months ago
- Synchronize Whisper's timestamps over an existing accurate transcriptionโ160Updated last year
- Examples of using the llasa-tts models locallyโ182Updated 9 months ago
- Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.โ74Updated 6 months ago
- A highly optimized engine for neutts-air model to generate minutes of audio in seconds. Over 200x realtime on modern hardware!โ110Updated 2 months ago
- Running the F5-TTS by ONNX Runtimeโ191Updated last month
- Very fast, accurate speaker diarizationโ223Updated last month
- C++ library for converting text to phonemes for Piperโ139Updated 6 months ago
- โ100Updated last year
- A TTS model capable of generating ultra-realistic dialogue in one pass.โ219Updated 9 months ago
- Real-Time Whisper Voice Recognition with vosk model feedback.โ121Updated 2 years ago
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelinesโ100Updated last year
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusionโ187Updated last year
- Create an LJSpeech structured voice dataset on wave inputโ37Updated last year
- Fast audio super resolution from 16khz to 48khz.โ192Updated last month
- On-device streaming text-to-speech engine powered by deep learningโ128Updated 2 weeks ago
- Whisper combined with Silero VAD, for improved long-form transcriptionsโ54Updated 3 years ago
- Speaker diarization modelโ32Updated 2 years ago
- Streaming and Fine-tuning for Chatterbox TTSโ264Updated 7 months ago
- ONNX-compatible Fast SeamlessM4TโMassively Multilingual & Multimodal Machine Translationโ43Updated 2 years ago
- Faster Tortoise inference then Tortoise Fast Forkโ127Updated last year