modelscope / ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
☆2,428Updated last month
Alternatives and similar repositories for ClearerVoice-Studio:
Users that are interested in ClearerVoice-Studio are comparing it to the libraries listed below
- InspireMusic: A Unified Framework for Music, Song, Audio Generation.☆977Updated last week
- Interface for OuteTTS models.☆955Updated last month
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,599Updated 7 months ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,633Updated 3 weeks ago
- An Open-Sourced LLM-empowered Foundation TTS System☆632Updated 5 months ago
- StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.☆1,042Updated 6 months ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆1,957Updated this week
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆887Updated 4 months ago
- GLM-4-Voice | 端到端中英语音对话模型☆2,770Updated 3 months ago
- AI powered speech denoising and enhancement☆1,695Updated 3 months ago
- first base model for full-duplex conversational audio☆1,719Updated 2 months ago
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆686Updated 2 weeks ago
- TTS with kokoro and onnx runtime☆1,789Updated 3 weeks ago
- Taming Stable Diffusion for Lip Sync!☆2,984Updated this week
- [CVPR 2025] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis☆1,231Updated last week
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,217Updated 4 months ago
- Multilingual Voice Understanding Model☆4,944Updated 2 months ago
- https://hf.co/hexgrad/Kokoro-82M☆1,717Updated 3 weeks ago
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆467Updated last week
- A Fast TTS Engine☆468Updated last month
- 🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.☆1,123Updated last week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,857Updated 4 months ago
- Memory-Guided Diffusion for Expressive Talking Video Generation☆763Updated last month
- Local realtime voice AI☆2,256Updated 2 weeks ago