Soul-AILab/SoulX-Podcast

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Soul-AILab/SoulX-Podcast)

Soul-AILab / SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

☆3,505

Alternatives and similar repositories for SoulX-Podcast

Users that are interested in SoulX-Podcast are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

index-tts / index-tts
View on GitHub
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
☆22,226Jul 14, 2026Updated 2 weeks ago
OpenMOSS / MOSS-TTSD
View on GitHub
MOSS-TTSD is a spoken dialogue generation model designed for expressive multi-speaker synthesis. It features long-context modeling, flex…
☆1,363Updated this week
ASLP-lab / VoiceSculptor
View on GitHub
An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.
☆250Feb 26, 2026Updated 5 months ago
FireRedTeam / FireRedTTS2
View on GitHub
Long-form streaming TTS system for multi-speaker dialogue generation
☆1,416Oct 26, 2025Updated 9 months ago
MeiGen-AI / InfiniteTalk
View on GitHub
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
☆7,536May 22, 2026Updated 2 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
QwenAudio / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,464May 25, 2026Updated 2 months ago
zai-org / GLM-TTS
View on GitHub
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
☆1,046Apr 10, 2026Updated 3 months ago
Soul-AILab / SoulX-FlashTalk
View on GitHub
SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS …
☆1,428May 21, 2026Updated 2 months ago
stepfun-ai / Step-Audio-EditX
View on GitHub
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…
☆955Apr 9, 2026Updated 3 months ago
Soul-AILab / SoulX-Duplug
View on GitHub
Plug-and-play streaming semantic VAD for real-time full-duplex spoken dialogue systems.
☆278Jul 17, 2026Updated last week
Soul-AILab / SoulX-Singer
View on GitHub
Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
☆854May 29, 2026Updated 2 months ago
Soul-AILab / SoulX-Transcriber
View on GitHub
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
☆284Jun 22, 2026Updated last month
inclusionAI / Ming-UniAudio
View on GitHub
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
☆451Nov 27, 2025Updated 8 months ago
OpenBMB / VoxCPM
View on GitHub
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
☆34,391Jul 8, 2026Updated 3 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
XiaomiMiMo / MiMo-Audio
View on GitHub
MiMo-Audio: Audio Language Models are Few-Shot Learners
☆1,068Jun 17, 2026Updated last month
GiantAILab / DiaMoE-TTS
View on GitHub
Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptat…
☆246Nov 28, 2025Updated 8 months ago
xingchensong / FlashCosyVoice
View on GitHub
FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.
☆250Feb 25, 2026Updated 5 months ago
stepfun-ai / Step-Audio2
View on GitHub
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation…
☆1,485Mar 16, 2026Updated 4 months ago
jianyun8023 / soulx-tts-metal
View on GitHub
SoulX Podcast TTS Metal Test
☆66Oct 30, 2025Updated 8 months ago
fishaudio / fish-speech
View on GitHub
SOTA Open Source TTS
☆31,398Updated this week
inclusionAI / Ming-omni-tts
View on GitHub
Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control
☆264Feb 26, 2026Updated 5 months ago
k2-fsa / ZipVoice
View on GitHub
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
☆1,021Dec 2, 2025Updated 7 months ago
ASLP-lab / MeanVC
View on GitHub
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
☆298Jan 8, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
k2-fsa / Flow2GAN
View on GitHub
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
☆146Mar 8, 2026Updated 4 months ago
stepfun-ai / Step-Audio
View on GitHub
☆34Mar 16, 2026Updated 4 months ago
Ksuriuri / index-tts-vllm
View on GitHub
Added vLLM support to IndexTTS for faster inference.
☆1,214Apr 13, 2026Updated 3 months ago
QwenAudio / Fun-Audio-Chat
View on GitHub
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
☆985Feb 27, 2026Updated 5 months ago
SparkAudio / Spark-TTS
View on GitHub
Spark-TTS Inference Code
☆11,001Apr 9, 2025Updated last year
bigai-nlco / UltraVoice
View on GitHub
Official Repository of UltraVoice
☆63Oct 28, 2025Updated 9 months ago
jzq2000 / MoonCast
View on GitHub
☆348Apr 11, 2025Updated last year
microsoft / VibeVoice
View on GitHub
Open-Source Frontier Voice AI
☆50,834Updated this week
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ASLP-lab / MINT-Bench
View on GitHub
☆49May 2, 2026Updated 2 months ago
xingchensong / S3Tokenizer
View on GitHub
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
☆521Dec 22, 2025Updated 7 months ago
meituan-longcat / LongCat-AudioDiT
View on GitHub
☆554Apr 3, 2026Updated 3 months ago
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Jul 17, 2026Updated last week
FireRedTeam / FireRedTTS
View on GitHub
An Open-Sourced LLM-empowered Foundation TTS System
☆909Sep 28, 2025Updated 10 months ago
666ghj / BettaFish
View on GitHub
微舆：人人可用的多Agent舆情分析助手，打破信息茧房，还原舆情原貌，预测未来走向，辅助决策！从0实现，不依赖任何框架。
☆41,878Jul 21, 2026Updated last week
2noise / ChatTTS
View on GitHub
A generative speech model for daily dialogue.
☆39,689Apr 10, 2026Updated 3 months ago