abus-aikorea / voice-pro
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
☆3,610Updated last week
Alternatives and similar repositories for voice-pro:
Users that are interested in voice-pro are comparing it to the libraries listed below
- TTS with kokoro and onnx runtime☆1,901Updated 2 weeks ago
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆1,475Updated this week
- Generate audiobooks from e-books☆3,282Updated last month
- Towards Human-Sounding Speech☆4,490Updated last week
- https://hf.co/hexgrad/Kokoro-82M☆2,432Updated 2 weeks ago
- Local realtime voice AI☆2,279Updated last month
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆11,448Updated last week
- Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯☆833Updated last month
- Synchronized Translation for Videos. Video dubbing☆1,103Updated 2 months ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,687Updated last month
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,644Updated last week
- A superfast full-text search application☆1,088Updated 4 months ago
- Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…☆8,969Updated 2 weeks ago
- Enable AI models for video production in the browser☆1,616Updated last month
- Interface for OuteTTS models.☆1,178Updated last week
- A fast multimodal LLM for real-time voice☆3,855Updated 2 months ago
- Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits f…☆996Updated last week
- Inference and training library for high-quality TTS models.☆5,212Updated 4 months ago
- first base model for full-duplex conversational audio☆1,731Updated 3 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,656Updated 8 months ago
- A passive recording project allows you to have complete control over your data. Automatically take screenshots of all your screens, index…☆1,244Updated last week
- AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording☆13,542Updated last week
- An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System☆1,346Updated this week
- OCR & Document Extraction using vision models☆11,011Updated this week
- Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit☆762Updated 8 months ago
- YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open☆4,842Updated 2 weeks ago
- A self-hosted API that takes a URL and returns a file with browser screenshots.☆959Updated last month
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser instance that lets you automate the web wi…☆4,239Updated this week
- Share your screen with one simple room code. No downloads or sign-ups required.☆1,557Updated 4 months ago
- 🔄 CLI to convert Webpages to PDFs 🚀☆1,234Updated 3 months ago