herimor / voxtreamLinks
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency
β171Updated last month
Alternatives and similar repositories for voxtream
Users that are interested in voxtream are comparing it to the libraries listed below
Sorting:
- ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨β128Updated 3 months ago
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.β120Updated last month
- β288Updated 4 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLMβ290Updated 6 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDβ¦β193Updated 2 months ago
- β330Updated last month
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"β128Updated 5 months ago
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vecβ¦β111Updated 2 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.β68Updated last month
- A TTS model capable of generating ultra-realistic dialogue in one pass.β127Updated 4 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)β248Updated 8 months ago
- VALL-E 2 reproductionβ132Updated last year
- LongCat Audio Tokenizer and Detokenizerβ252Updated last week
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on Onβ¦β221Updated 6 months ago
- Open TTS models, built for streaming on the edgeβ44Updated 8 months ago
- High quality text-to-speech based on StyleTTS 2.β70Updated 3 weeks ago
- SoTA open-source TTSβ114Updated 5 months ago
- An unofficial PyTorch implementation of VALL-Eβ88Updated 3 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.β215Updated 7 months ago
- [NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.β177Updated last month
- The official Implementation of PeriodWave and PeriodWave-Turboβ210Updated 7 months ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesisβ328Updated 4 months ago
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variabilityβ104Updated 10 months ago
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.β45Updated 2 months ago
- Official implementation of the TTS model Lina-Speechβ175Updated 10 months ago
- An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.β193Updated 4 months ago
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.β85Updated last year
- β32Updated 3 months ago
- Collection of Open Source Speech Dataβ163Updated last month
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformersβ57Updated 6 months ago