playht / PlayDiffusionLinks
☆480Updated last week
Alternatives and similar repositories for PlayDiffusion
Users that are interested in PlayDiffusion are comparing it to the libraries listed below
Sorting:
- ☆432Updated last month
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆258Updated 3 weeks ago
- Kyutai with an "eye"☆200Updated 3 months ago
- A Fast TTS Engine☆514Updated 5 months ago
- ☆577Updated this week
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆752Updated 2 weeks ago
- ☆407Updated last month
- A real-time speech-to-speech chatbot powered by Whisper Small, Llama 3.2, and Kokoro-82M.☆229Updated 5 months ago
- High-performance Text-to-Speech server with OpenAI-compatible API, 8 voices, emotion tags, and modern web UI. Optimized for RTX GPUs.☆440Updated 2 months ago
- Examples of using the llasa-tts models locally☆173Updated 2 months ago
- Lightweight Gradio based WebUI for orpheusTTS - WSL / Linux [CUDA]☆98Updated 3 months ago
- Run Orpheus 3B Locally With LM Studio☆428Updated 3 months ago
- Speech-to-speech AI assistant with natural conversation flow, mid-speech interruption, vision capabilities and AI-initiated follow-ups. F…☆170Updated 2 months ago
- KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution☆321Updated last week
- Fast Streaming TTS with Orpheus + WebRTC (with FastRTC)☆294Updated 2 months ago
- Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible),…☆310Updated last week
- Sesame CSM 1B Voice Cloning☆305Updated 3 months ago
- ☆181Updated this week
- Modified version of Chatterbox that accepts text files as input and no character restrictions☆278Updated this week
- G2P☆262Updated last month
- Delayed Streams Modeling (DSM) is a flexible formulation for streaming, multimodal sequence-to-sequence learning.☆211Updated this week
- Self-host the powerful Dia TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), suppor…☆255Updated 3 weeks ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆179Updated 2 months ago
- Unlock Pose Diversity: Accurate and Efficient Implicit Keypoint-based Spatiotemporal Diffusion for Audio-driven Talking Portrait☆264Updated last month
- Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation☆427Updated this week
- Googles NotebookLM but local☆291Updated 2 months ago
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆577Updated 2 months ago
- Streaming and Fine-tuning for Chatterbox TTS☆109Updated last week
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆229Updated last week
- The official repo for paper "Spatial Speech Translation: Translating Across Space With Binaural Hearables"☆64Updated last month