mbzuai-oryx / LLMVoX
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
☆224Updated last week
Alternatives and similar repositories for LLMVoX:
Users that are interested in LLMVoX are comparing it to the libraries listed below
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆238Updated 3 weeks ago
- PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.☆427Updated last week
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆186Updated last week
- Real-time Speech-Text Foundation Model Toolkit (wip)☆214Updated last week
- ☆352Updated 6 months ago
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆157Updated last week
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆159Updated this week
- Collection of Open Source Speech Data☆152Updated 4 months ago
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆483Updated 3 weeks ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆169Updated 2 months ago
- G2P☆182Updated this week
- ☆210Updated 2 weeks ago
- A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.☆345Updated last week
- Kyutai with an "eye"☆160Updated last week
- Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate☆531Updated 4 months ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆68Updated 5 months ago
- ☆254Updated last year
- AudioBench: A Universal Benchmark for Audio Large Language Models☆176Updated this week
- This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming archi…☆89Updated last month
- Unified automatic quality assessment for speech, music, and sound.☆434Updated last week
- ☆88Updated this week
- LlamaVoice is a llama-based large voice generation model, providing inference and training ability.☆232Updated 7 months ago
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion☆173Updated 6 months ago
- Official implementation of the TTS model Lina-Speech☆157Updated 2 months ago
- The official GitHub page for the survey paper "Foundation Models for Music: A Survey".☆198Updated 6 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆145Updated last month
- An unofficial PyTorch implementation of VALL-E☆87Updated this week
- Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)☆392Updated this week
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆297Updated 3 months ago
- Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch☆457Updated 3 weeks ago