herimor / voxtreamLinks
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency
β176Updated last month
Alternatives and similar repositories for voxtream
Users that are interested in voxtream are comparing it to the libraries listed below
Sorting:
- ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨β130Updated 4 months ago
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.β133Updated 2 months ago
- β343Updated 2 months ago
- β289Updated 4 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDβ¦β195Updated 2 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLMβ291Updated 7 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)β250Updated 8 months ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"β140Updated 6 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.β127Updated 4 months ago
- β218Updated 2 months ago
- β249Updated 7 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.β69Updated last month
- [NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.β180Updated last week
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.β45Updated 3 months ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesisβ334Updated 4 months ago
- VALL-E 2 reproductionβ133Updated last year
- An unofficial PyTorch implementation of VALL-Eβ88Updated 4 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.β219Updated 7 months ago
- SoTA open-source TTSβ120Updated 6 months ago
- Official implementation of the TTS model Lina-Speechβ175Updated 11 months ago
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformersβ57Updated 7 months ago
- A TTS model that makes a speaker speak new languagesβ76Updated last year
- β103Updated 2 months ago
- An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.β194Updated 5 months ago
- The official Implementation of PeriodWave and PeriodWave-Turboβ213Updated 8 months ago
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vecβ¦β111Updated 3 months ago
- A Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTSβ51Updated last year
- finetune llm part for spark-tts modelβ112Updated 8 months ago
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.β86Updated last year
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,β¦β80Updated last year