Fine-tuning Moshi/J-Moshi on your own spoken dialogue data
☆98Jan 5, 2026Updated 4 months ago
Alternatives and similar repositories for moshi-finetune
Users that are interested in moshi-finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A real-time software for turn-taking, backchannel, and head-nodding prediction☆97May 20, 2026Updated last week
- Proof of concept for running moshi/hibiki using webrtc☆21Feb 28, 2025Updated last year
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- A real-time implementation of Voice Activity Projection (VAP) is aimed at controlling behaviors of spoken dialogue systems, such as turn-…☆101Jul 24, 2025Updated 10 months ago
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆62Oct 23, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆39Apr 3, 2025Updated last year
- ☆454Oct 3, 2025Updated 7 months ago
- Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…☆31Sep 20, 2025Updated 8 months ago
- PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.☆66Sep 8, 2025Updated 8 months ago
- Voice Activity Projection Models: Self-supervised learning of Turn-taking Events☆102May 29, 2024Updated 2 years ago
- ☆25Jul 30, 2025Updated 10 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆78Nov 1, 2024Updated last year
- Real-time Speech-Text Foundation Model Toolkit (wip)☆257Mar 26, 2025Updated last year
- ☆69Jul 29, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆11Jan 10, 2024Updated 2 years ago
- ☆19Mar 22, 2024Updated 2 years ago
- ☆36Sep 6, 2025Updated 8 months ago
- JAX implementation of Large Language Models. You can train GPT-2-like model with 青空文庫 (aozora bunko-clean dataset) or any other text dat…☆13Aug 5, 2024Updated last year
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆17May 16, 2025Updated last year
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆112Apr 1, 2024Updated 2 years ago
- Text-To-Speech for NotebookLM☆39Jul 20, 2025Updated 10 months ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"☆159Mar 3, 2026Updated 2 months ago
- ☆44Sep 19, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Coco-Nut (Corpus of connecting NIHONGO utterance and text) corpus☆21Jun 12, 2024Updated last year
- SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)☆113Aug 1, 2025Updated 9 months ago
- a Frontier Japanese Speech Generation net☆64May 15, 2025Updated last year
- Unofficial pytorch implementation of VISinger: Variational Inference with Adversarial Learning for End-to-end Singing Voice Synthesis (IC…☆20May 12, 2023Updated 3 years ago
- PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions☆86Oct 11, 2024Updated last year
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆36Aug 30, 2025Updated 9 months ago
- Official Repository of Paper: "SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding" (IC…☆71Apr 27, 2026Updated last month
- Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995☆79Dec 3, 2024Updated last year
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluators☆56May 15, 2025Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …☆28Nov 7, 2025Updated 6 months ago
- Variable Bitrate Residual Vector Quantization for Audio Coding☆52May 1, 2025Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- This is the official train-dev-test release of the Interspeech2024 Discrete Speech Representation Challenge.☆32Jan 26, 2024Updated 2 years ago
- Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech (INTERSPEECH 2022)☆122Jan 24, 2023Updated 3 years ago
- ☆151Apr 25, 2025Updated last year
- Please visit https://thuhcsi.github.io/SnakeGAN/☆37Apr 25, 2023Updated 3 years ago