SparkAudio / Spark-TTS
Spark-TTS Inference Code
☆9,041Updated 3 weeks ago
Alternatives and similar repositories for Spark-TTS:
Users that are interested in Spark-TTS are comparing it to the libraries listed below
- ☆4,245Updated last month
- SOTA Open Source TTS☆20,921Updated 3 weeks ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆11,659Updated this week
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆13,515Updated this week
- Taming Stable Diffusion for Lip Sync!☆3,854Updated last week
- Multilingual Voice Understanding Model☆5,511Updated last month
- An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System☆1,556Updated last week
- 一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with su…☆6,998Updated 2 weeks ago
- A video translation and dubbing tool powered by LLMs, offering professional-grade translations and one-click full-process deployment. It…☆6,528Updated this week
- Towards Human-Sounding Speech☆4,633Updated 2 weeks ago
- AigcPanel 是一个简单易用的一站式AI数字人系统,支持视频合成、声音合成、声音克隆,简化本地模型管理、一键导入和使用AI模型。☆3,183Updated 2 weeks ago
- ☆7,799Updated this week
- ☆5,070Updated 3 weeks ago
- Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"☆2,629Updated last month
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆13,844Updated this week
- Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching☆2,527Updated last week
- Toolkit for linearizing PDFs for LLM datasets/training☆12,238Updated this week
- A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.☆13,137Updated this week
- A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。☆32,686Updated this week
- A generative speech model for daily dialogue.☆36,024Updated last month
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆3,204Updated this week
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,387Updated 2 weeks ago
- GLM-4-Voice | 端到端中英语音对话模型☆2,884Updated 5 months ago
- 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.☆37,357Updated this week
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆10,186Updated 2 weeks ago
- ☆2,907Updated last month
- Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with…☆3,635Updated last week
- TTS with kokoro and onnx runtime☆1,938Updated 3 weeks ago
- A simple screen parsing tool towards pure vision based GUI agent☆21,888Updated last month
- Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and pe…☆2,812Updated this week