kyutai-labs / moshi-finetune
☆203Updated 3 weeks ago
Alternatives and similar repositories for moshi-finetune:
Users that are interested in moshi-finetune are comparing it to the libraries listed below
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆253Updated last month
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆240Updated last month
- Collection of Open Source Speech Data☆153Updated 5 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆224Updated last month
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆104Updated 2 weeks ago
- SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on On…☆204Updated 3 weeks ago
- Official implementation of the TTS model Lina-Speech☆163Updated 3 months ago
- ☆113Updated 2 weeks ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆180Updated this week
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion☆174Updated 6 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆95Updated 6 months ago
- A simple, hackable text-to-speech system in PyTorch and MLX☆153Updated 2 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated 2 weeks ago
- ☆255Updated last year
- ☆356Updated 7 months ago
- VALL-E 2 reproduction☆125Updated 9 months ago
- a Frontier Japanese Speech Generation net☆31Updated last month
- G2P☆218Updated last week
- A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.☆354Updated last week
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆69Updated 6 months ago
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability☆101Updated 3 months ago
- ☆62Updated 9 months ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆177Updated 3 months ago
- ☆40Updated 2 months ago
- PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.☆459Updated last week
- Open TTS models, built for streaming on the edge☆39Updated last month
- A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.☆50Updated 5 months ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆156Updated 7 months ago
- Audiogen Codec☆135Updated 9 months ago
- ☆284Updated 10 months ago