OpenBMB / VoxCPMLinks
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
☆584Updated this week
Alternatives and similar repositories for VoxCPM
Users that are interested in VoxCPM are comparing it to the libraries listed below
Sorting:
- ☆458Updated 4 months ago
- ☆447Updated 4 months ago
- ☆516Updated last month
- ☆311Updated 5 months ago
- Long-form streaming TTS system for multi-speaker dialogue generation☆434Updated this week
- Long-form conversational TTS | Community fork☆204Updated last week
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,080Updated this week
- ☆740Updated last month
- ☆280Updated last month
- Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching☆614Updated this week
- MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting…☆952Updated last week
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆613Updated 5 months ago
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆784Updated last month
- GPT-4o-level, real-time spoken dialogue system.☆355Updated 7 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆260Updated 2 months ago
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆282Updated 3 months ago
- ☆228Updated 4 months ago
- AudioStory: Generating Long-Form Narrative Audio with Large Language Models☆271Updated 2 weeks ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆278Updated 4 months ago
- ☆246Updated 3 weeks ago
- A fundamental toolkit designed for music, song, and audio generation☆1,194Updated 4 months ago
- Kyutai with an "eye"☆218Updated 5 months ago
- The showcase page of IndexTTS2☆153Updated 2 months ago
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆299Updated this week
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆308Updated 2 months ago
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆624Updated 2 months ago
- Text-audio foundation model from Boson AI☆104Updated 2 weeks ago
- ☆632Updated last month
- ☆526Updated last week
- Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment☆170Updated 7 months ago