modelscope / ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
☆2,517Updated last week
Alternatives and similar repositories for ClearerVoice-Studio:
Users that are interested in ClearerVoice-Studio are comparing it to the libraries listed below
- Multilingual Voice Understanding Model☆5,180Updated last week
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,648Updated 7 months ago
- InspireMusic: A Unified Framework for Music, Song, Audio Generation.☆1,025Updated last week
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,140Updated last week
- Interface for OuteTTS models.☆959Updated last month
- An Open-Sourced LLM-empowered Foundation TTS System☆659Updated 5 months ago
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆1,351Updated this week
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆832Updated last week
- An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System☆406Updated last week
- StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.☆1,048Updated 7 months ago
- Taming Stable Diffusion for Lip Sync!☆3,424Updated last week
- TTS with kokoro and onnx runtime☆1,827Updated this week
- https://hf.co/hexgrad/Kokoro-82M☆2,063Updated this week
- GLM-4-Voice | 端到端中英语音对话模型☆2,801Updated 3 months ago
- first base model for full-duplex conversational audio☆1,728Updated 2 months ago
- 🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.☆1,153Updated last week
- ☆1,235Updated 9 months ago
- Inference and training library for high-quality TTS models.☆5,168Updated 3 months ago
- ☆4,087Updated 3 weeks ago
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,251Updated 4 months ago
- [CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation☆3,434Updated last month
- A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization☆1,849Updated 2 weeks ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,653Updated last month
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆2,049Updated this week
- Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"☆2,271Updated 3 weeks ago
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆487Updated 3 weeks ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆7,956Updated this week
- Controllable and fast Text-to-Speech for over 7000 languages!☆1,572Updated 4 months ago
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,319Updated 2 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆12,574Updated last week