icynic / desktop-live-caption
Transcribe desktop audio/computer audio in real-time and locally (Streaming ASR), using TorchAudio and Emformer-RNNT model for inference, PyAudio for reading stream, Tkinter for GUI.
☆12Updated last year
Alternatives and similar repositories for desktop-live-caption:
Users that are interested in desktop-live-caption are comparing it to the libraries listed below
- AI powered speech denoising and enhancement. Adapted for windows and optimized☆85Updated 9 months ago
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.☆35Updated 2 weeks ago
- Misc. tools/scripts that I made to use for tortoise☆21Updated 8 months ago
- ☆223Updated last month
- Advanced RVC Inference for quicker and effortless model downloads☆49Updated last month
- Running the F5-TTS by ONNX Runtime☆148Updated last week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆58Updated 5 months ago
- Official implementation of the TTS model Lina-Speech☆164Updated 4 months ago
- 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloning☆159Updated 9 months ago
- RTVC: Real-Time Voice Conversion GUI☆55Updated last year
- TTS support with GGML☆32Updated this week
- Your one-stop solution for voice dataset creation☆119Updated last year
- Chat with your RVC models. See website for demo:☆22Updated last year
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆125Updated 2 weeks ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆52Updated this week
- ☆96Updated last year
- High quality text-to-speech based on StyleTTS 2.☆39Updated this week
- RVC Onnx Infer- Upgraded and simplified-ish☆22Updated last year
- Efficient approach to speaker diarization using voice characteristics extraction☆94Updated last year
- A collection of neural vocoders suitable for singing voice synthesis tasks.☆122Updated last month
- 🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.☆243Updated 11 months ago
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion☆176Updated 7 months ago
- Robust Speech Recognition via Large-Scale Weak Supervision☆30Updated last year
- ChatTTS is a generative speech model for daily dialogue.☆22Updated 4 months ago
- Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction☆116Updated 3 weeks ago
- YuE with mp3 extend, exllama and GUI☆48Updated 2 months ago
- Ultimate Vocal Remover CLI☆139Updated 3 months ago
- Real-time end-to-end singing voice convertion☆21Updated 6 months ago
- zero-shot realtime TTS system, fully offline, free and open source☆35Updated 3 weeks ago
- List of repositories relevant to VITS.☆36Updated 2 years ago