OpenBMB / VoxCPMLinks
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
☆5,715Updated 2 weeks ago
Alternatives and similar repositories for VoxCPM
Users that are interested in VoxCPM are comparing it to the libraries listed below
Sorting:
- On-device TTS model by Neuphonic☆4,768Updated this week
- Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streamin…☆6,994Updated this week
- Unlimited-length talking video generation that supports image-to-video and video-to-video generation☆4,750Updated last month
- Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.☆3,485Updated last week
- Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.☆2,552Updated 2 weeks ago
- [NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation☆2,794Updated last month
- GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning☆918Updated last month
- A TTS that fits in your CPU (and pocket)☆2,995Updated this week
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆800Updated 3 months ago
- SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.☆3,125Updated last month
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,330Updated 4 months ago
- Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music…☆1,317Updated last week
- PersonaLive! : Expressive Portrait Image Animation for Live Streaming☆1,612Updated last month
- The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trai…☆3,256Updated last month
- Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.☆2,832Updated last week
- "ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"☆2,239Updated last month
- Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.☆789Updated last week
- Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages☆2,620Updated last month
- The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment☆1,337Updated last month
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆862Updated last week
- VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)☆964Updated 2 weeks ago
- TTS model capable of streaming conversational audio in realtime.☆1,051Updated 2 months ago
- A quick vibe coded app for deepseek OCR☆1,714Updated 2 months ago
- SkyReels-V2: Infinite-length Film Generative model☆6,212Updated last week
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆4,477Updated 7 months ago
- Open-Source Frontier Voice AI☆22,955Updated this week
- ☆2,024Updated last month
- MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting…☆1,091Updated 2 months ago
- GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters☆724Updated last month
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,575Updated 9 months ago