icynic / desktop-live-caption
Transcribe desktop audio/computer audio in real-time and locally (Streaming ASR), using TorchAudio and Emformer-RNNT model for inference, PyAudio for reading stream, Tkinter for GUI.
β11Updated 8 months ago
Alternatives and similar repositories for desktop-live-caption:
Users that are interested in desktop-live-caption are comparing it to the libraries listed below
- Efficient approach to speaker diarization using voice characteristics extractionβ83Updated 9 months ago
- π π€ Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloningβ150Updated 6 months ago
- ONNX Inference of Pyannote Segmentationβ81Updated last month
- G2Pβ35Updated this week
- Running the F5-TTS by ONNX Runtimeβ91Updated this week
- On-device speaker diarization powered by deep learningβ34Updated 2 weeks ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.β53Updated last week
- Misc. tools/scripts that I made to use for tortoiseβ21Updated 5 months ago
- zero-shot realtime TTS system, fully offline, free and open sourceβ24Updated 2 weeks ago
- Split long audio files based on subtitle-info in SRT File (Transcript saved in CSV)β20Updated 5 years ago
- Convert your PDFs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and efficientβ¦β37Updated 2 weeks ago
- β90Updated 9 months ago
- Create an LJSpeech structured voice dataset on wave inputβ24Updated 4 months ago
- SadTalker gradio_demo.py file with code section that allows you to set the eye blink and pose reference videos for the software to use whβ¦β11Updated last year
- π Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. π§π₯π Advanced audio processing.β231Updated 7 months ago
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restorationβ114Updated 2 weeks ago
- Fine-Tune Whisper with Transformers and PEFTβ48Updated last year
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.β66Updated 2 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPβ¦β90Updated 3 months ago
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusionβ169Updated 4 months ago
- ONNX-compatible Fast SeamlessM4TβMassively Multilingual & Multimodal Machine Translationβ42Updated last year
- Speech recognition & diarisation solution with text alignment, deployed in AML pipelinesβ91Updated 8 months ago
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusionβ10Updated 4 months ago
- This is the audio sample repository for speech separation model "MossFormer2".β120Updated 2 months ago
- On-device streaming text-to-speech engine powered by deep learningβ64Updated 2 weeks ago
- Whisper combined with Silero VAD, for improved long-form transcriptionsβ45Updated 2 years ago
- β195Updated 3 months ago
- Community framework for training tortoiseβ40Updated 2 years ago
- ez audio transcription tool with flexible processing and post-processing optionsβ141Updated 11 months ago
- Uses deepgram/whisper/custom models to create an LJSpeech dataset for voice model fine tuningβ21Updated 3 weeks ago