icynic / desktop-live-caption
Transcribe desktop audio/computer audio in real-time and locally (Streaming ASR), using TorchAudio and Emformer-RNNT model for inference, PyAudio for reading stream, Tkinter for GUI.
ā10Updated 6 months ago
Related projects ā
Alternatives and complementary repositories for desktop-live-caption
- š š¤ Pip installable package for StyleTTS 2 human-level text-to-speech and voice cloningā138Updated 4 months ago
- Efficient approach to speaker diarization using voice characteristics extractionā68Updated 6 months ago
- Versatile AI-driven audio upscaler to enhance the quality of any audio.ā60Updated 2 months ago
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusionā159Updated last month
- Google's SoundStorm: Efficient Parallel Audio Generationā129Updated last year
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.ā61Updated last week
- Python Audio Separator in Real Time using MDX-NET modelā12Updated last year
- Create an LJSpeech structured voice dataset on wave inputā21Updated last month
- Chat with your RVC models. See website for demo:ā20Updated 9 months ago
- ā176Updated last month
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.ā45Updated 2 weeks ago
- Your one-stop solution for voice dataset creationā112Updated 11 months ago
- A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper "State of the Art" modelsā65Updated 2 years ago
- List of repositories relevant to VITS.ā35Updated last year
- VALL-E 2 reproductionā87Updated 4 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPā¦ā83Updated last month
- ā87Updated 6 months ago
- Real-time end-to-end singing voice convertionā18Updated 2 weeks ago
- AI powered speech denoising and enhancement. Adapted for windows and optimizedā64Updated 4 months ago
- Go from raw audio files to a text-audio dataset automatically with OpenAI's Whisper.ā133Updated last year
- On-device streaming text-to-speech engine powered by deep learningā57Updated this week
- š Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. š§š„š Advanced audio processing.ā209Updated 5 months ago
- Community framework for training tortoiseā38Updated 2 years ago
- An unofficial PyTorch implementation of VALL-Eā77Updated this week
- Fine-Tune Whisper with Transformers and PEFTā38Updated last year
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"ā28Updated this week
- C++ version of pyannote audio speaker diarizaiton pipelineā18Updated 9 months ago
- Faster Tortoise inference then Tortoise Fast Forkā122Updated 7 months ago
- Running the F5-TTS by ONNX Runtimeā39Updated this week