mbzuai-oryx / LLMVoX
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
☆244Updated last month
Alternatives and similar repositories for LLMVoX
Users that are interested in LLMVoX are comparing it to the libraries listed below
Sorting:
- ☆210Updated last month
- Real-time Speech-Text Foundation Model Toolkit (wip)☆228Updated last month
- ☆132Updated last week
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆264Updated 2 months ago
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆206Updated last week
- PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.☆467Updated 2 weeks ago
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆176Updated last month
- ☆359Updated 8 months ago
- Collection of Open Source Speech Data☆154Updated 6 months ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆193Updated last week
- Kyutai with an "eye"☆190Updated last month
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆163Updated 3 weeks ago
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion☆176Updated 7 months ago
- Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)☆430Updated this week
- ☆124Updated last month
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆52Updated this week
- ☆256Updated last year
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆588Updated 5 months ago
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆54Updated last month
- Streaming and Finetuning code for CSM☆276Updated this week
- Official implementation of the TTS model Lina-Speech☆165Updated 4 months ago
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆155Updated last week
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆551Updated last month
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆71Updated 7 months ago
- ☆289Updated last week
- G2P☆232Updated 2 weeks ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated last month
- AudioBench: A Universal Benchmark for Audio Large Language Models☆205Updated last month
- A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.☆366Updated 3 weeks ago
- Audio tokenization, in the fastest way possible!☆52Updated 8 months ago