OpenBMB / VoxCPMLinks
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
☆5,067Updated last week
Alternatives and similar repositories for VoxCPM
Users that are interested in VoxCPM are comparing it to the libraries listed below
Sorting:
- Unlimited-length talking video generation that supports image-to-video and video-to-video generation☆4,563Updated last month
- Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streamin…☆2,333Updated this week
- On-device TTS model by Neuphonic☆4,679Updated last week
- [NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation☆2,784Updated last month
- Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…☆1,313Updated 4 months ago
- GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning☆889Updated last month
- Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…☆742Updated 3 months ago
- Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"☆1,460Updated this week
- Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.☆2,407Updated last week
- Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages☆2,611Updated 3 weeks ago
- Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.☆711Updated this week
- SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.☆3,098Updated last month
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆829Updated this week
- The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment☆1,312Updated last month
- Voice Activity Detector (VAD) : low-latency, high-performance and lightweight☆1,933Updated last month
- The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trai…☆3,183Updated 3 weeks ago
- ☆4,604Updated last month
- "ViMax: Agentic Video Generation (Director, Screenwriter, Producer, and Video Generator All-in-One)"☆1,884Updated last month
- Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.☆2,793Updated 2 months ago
- Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.☆2,900Updated last week
- VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)☆931Updated last month
- The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usa…☆5,792Updated 2 months ago
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆962Updated 4 months ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,548Updated 9 months ago
- Open-Source Frontier Voice AI☆20,973Updated this week
- PersonaLive! : Expressive Portrait Image Animation for Live Streaming☆1,509Updated 3 weeks ago
- ☆2,007Updated last month
- PersonaPlex code.☆3,110Updated this week
- TTS model capable of streaming conversational audio in realtime.☆1,023Updated last month
- GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters☆684Updated 3 weeks ago