boson-ai / higgs-audio-vllmLinks
Forked vLLM that supports higgs-audio model
β30Updated this week
Alternatives and similar repositories for higgs-audio-vllm
Users that are interested in higgs-audio-vllm are comparing it to the libraries listed below
Sorting:
- ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨β123Updated last month
- A TTS model capable of generating ultra-realistic dialogue in one pass.β120Updated last month
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesisβ298Updated last month
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,β¦β78Updated 11 months ago
- β278Updated last month
- Official implementation of the TTS model Lina-Speechβ168Updated 8 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)β244Updated 5 months ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"β117Updated 3 months ago
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.β42Updated this week
- Awesome music generation modelββMGΒ²β159Updated 5 months ago
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restorationβ184Updated 4 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDβ¦β156Updated last week
- A TTS model capable of generating ultra-realistic dialogue in one pass.β205Updated 4 months ago
- An unofficial PyTorch implementation of VALL-Eβ88Updated last month
- ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Modelβ211Updated last year
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusionβ185Updated 11 months ago
- This is an implementation for train hifigan part of XTTSv2 model using Coqui/TTS.β85Updated 10 months ago
- VALL-E 2 reproductionβ129Updated last year
- Tooling to build datasets for audio model trainingβ16Updated last year
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.β175Updated last year
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.β68Updated last month
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformerβ308Updated 2 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLMβ273Updated 3 months ago
- Text-audio foundation model from Boson AIβ95Updated last week
- In this repository I will be running various experiments on finetune different parts for xttsβ15Updated last year
- The official Implementation of PeriodWave and PeriodWave-Turboβ206Updated 4 months ago
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vecβ¦β109Updated this week
- BeltOut: An open source pitch-perfect voice-to-voice timbre transfer model based on ChatterboxVCβ76Updated last month
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28Γ speedup.β75Updated 10 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generationβ258Updated 2 months ago