thomasgauthier / csm-hf
Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers
☆53Updated 3 weeks ago
Alternatives and similar repositories for csm-hf:
Users that are interested in csm-hf are comparing it to the libraries listed below
- Finetune Sesame's CSM 1B model, for fun and profit☆15Updated last month
- ☆206Updated last month
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.☆35Updated last week
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.☆16Updated last month
- Open TTS models, built for streaming on the edge☆41Updated last month
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated 3 weeks ago
- Sesame Converse - Real Time Conversations - Powered by Gemma 3☆61Updated last month
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆141Updated 3 weeks ago
- Examples of using the llasa-tts models locally☆168Updated 2 weeks ago
- Streaming and Finetuning code for CSM☆267Updated 2 weeks ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆242Updated last month
- ☆96Updated last year
- Real-time Speech-Text Foundation Model Toolkit (wip)☆226Updated last month
- StyleTTS 2 Optimized Training Fork☆28Updated 3 months ago
- Adding a multi-text multi-speaker script (diffe) that is based on a script from asiff00 on issue 61 for Sesame: A Conversational Speech G…☆23Updated last month
- This is an on-CPU real-time conversational system for two-way speech communication with AI models, utilizing a continuous streaming archi…☆117Updated 2 weeks ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆264Updated last month
- ☆254Updated this week
- StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion☆176Updated 7 months ago
- High quality text-to-speech based on StyleTTS 2.☆37Updated this week
- Speech-to-speech AI assistant with natural conversation flow, mid-speech interruption, vision capabilities and AI-initiated follow-ups. F…☆132Updated 3 weeks ago
- Efficient approach to speaker diarization using voice characteristics extraction☆94Updated last year
- VALL-E 2 reproduction☆127Updated 9 months ago
- Hanasu is a human-like TTS model based on the multilingual Himitsu V1 transformer-based encoder and VITS architecture☆26Updated 3 weeks ago
- An unofficial PyTorch implementation of VALL-E☆87Updated this week
- A Conversational Speech Generation Model☆12Updated last month
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆71Updated 7 months ago
- Deploy Apollo HF space locally☆40Updated 4 months ago
- ☆62Updated 9 months ago
- Text-to-Music Generation with Rectified Flow Transformer☆62Updated 8 months ago