OpenBMB / VoxCPMLinks
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
☆3,195Updated this week
Alternatives and similar repositories for VoxCPM
Users that are interested in VoxCPM are comparing it to the libraries listed below
Sorting:
- Unlimited-length talking video generation that supports image-to-video and video-to-video generation☆4,078Updated 2 weeks ago
- MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting…☆1,064Updated 3 weeks ago
- GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning☆793Updated 2 weeks ago
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,277Updated 3 months ago
- Added vLLM support to IndexTTS for faster inference.☆974Updated 2 months ago
- SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.☆2,892Updated 3 weeks ago
- [NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation☆2,752Updated 2 weeks ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,489Updated 8 months ago
- A fundamental toolkit designed for music, song, and audio generation☆1,274Updated 7 months ago
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆2,168Updated last month
- An Open-Sourced LLM-empowered Foundation TTS System☆892Updated 3 months ago
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆791Updated last week
- The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment☆1,076Updated 2 weeks ago
- Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"☆1,206Updated 2 weeks ago
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆735Updated 2 months ago
- 基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。☆568Updated 7 months ago
- ☆472Updated 7 months ago
- ☆1,968Updated 2 weeks ago
- VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)☆882Updated last week
- Voice Activity Detector (VAD) : low-latency, high-performance and lightweight☆1,851Updated last week
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆912Updated 3 months ago
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆1,696Updated last week
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆4,417Updated 6 months ago
- Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching☆750Updated 3 weeks ago
- Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"☆3,151Updated 6 months ago
- An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System☆17,260Updated 3 weeks ago
- ☆498Updated 3 months ago
- ☆659Updated 2 months ago
- [NeurIPS 2025] OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication☆408Updated 3 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆684Updated last month