microsoft / VibeVoiceLinks
Frontier Open-Source Text-to-Speech
☆9,741Updated last month
Alternatives and similar repositories for VibeVoice
Users that are interested in VibeVoice are comparing it to the libraries listed below
Sorting:
- State-of-the-art TTS model under 25MB 😻☆8,999Updated 2 months ago
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆4,301Updated 4 months ago
- VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning☆1,882Updated 2 weeks ago
- SoTA open-source TTS☆14,223Updated last month
- Real-time & local speech-to-text server.☆7,963Updated 2 weeks ago
- On-device TTS model by Neuphonic☆3,641Updated last week
- Wan: Open and Advanced Large-Scale Video Generative Models☆10,375Updated 2 weeks ago
- Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.☆2,493Updated last month
- OmniGen2: Exploration to Advanced Multimodal Generation.☆3,915Updated 3 weeks ago
- Unlimited-length talking video generation that supports image-to-video and video-to-video generation☆2,743Updated 2 months ago
- Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, im…☆2,762Updated 2 weeks ago
- Towards Human-Sounding Speech☆5,670Updated 5 months ago
- Text-audio foundation model from Boson AI☆7,483Updated last month
- ☆6,004Updated 2 months ago
- SkyReels-V2: Infinite-length Film Generative model☆4,800Updated 2 months ago
- MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.☆2,932Updated 3 months ago
- A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speec…☆2,760Updated last week
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆5,432Updated 2 weeks ago
- This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025☆6,804Updated 5 months ago
- Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.☆5,762Updated 3 weeks ago
- [NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation☆2,611Updated last month
- ACE-Step: A Step Towards Music Generation Foundation Model☆3,164Updated 4 months ago
- https://hf.co/hexgrad/Kokoro-82M☆4,627Updated 2 months ago
- A simple yet powerful agent framework that delivers with open-source models☆3,657Updated this week
- Have a natural, spoken conversation with AI!☆3,285Updated 3 months ago
- Open-Source AI Presentation Generator and API (Gamma, Beautiful AI, Decktopus Alternative)☆2,574Updated 2 weeks ago
- GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models☆3,055Updated 2 weeks ago
- The world's first open-source multimodal creative assistant This is a substitute for Canva and Manus that prioritizes privacy and is usa…☆5,030Updated last month
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,343Updated 6 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆18,688Updated 3 months ago