herimor / voxtreamLinks
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency
β151Updated this week
Alternatives and similar repositories for voxtream
Users that are interested in voxtream are comparing it to the libraries listed below
Sorting:
- ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨β127Updated 2 months ago
- β285Updated 2 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDβ¦β183Updated 3 weeks ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"β118Updated 4 months ago
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vecβ¦β109Updated last month
- β309Updated last week
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLMβ284Updated 5 months ago
- [NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.β164Updated last week
- A TTS model capable of generating ultra-realistic dialogue in one pass.β126Updated 2 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)β247Updated 6 months ago
- VALL-E 2 reproductionβ131Updated last year
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,β¦β80Updated last year
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on Onβ¦β218Updated 4 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.β68Updated last month
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesisβ314Updated 2 months ago
- β234Updated 4 months ago
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.β99Updated last week
- β50Updated 6 months ago
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variabilityβ102Updated 8 months ago
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representationsβ175Updated last year
- β37Updated 6 months ago
- Official implementation of the TTS model Lina-Speechβ170Updated 9 months ago
- This is the M-AILABS Speech Datasetβ87Updated 10 months ago
- β99Updated 2 weeks ago
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesisβ139Updated 9 months ago
- finetune llm part for spark-tts modelβ111Updated 6 months ago
- The official Implementation of PeriodWave and PeriodWave-Turboβ207Updated 6 months ago
- High quality text-to-speech based on StyleTTS 2.β65Updated this week
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).β87Updated last week
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.β85Updated 11 months ago