thomasgauthier/csm-hf

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thomasgauthier/csm-hf)

thomasgauthier / csm-hf

Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers

☆58

Alternatives and similar repositories for csm-hf

Users that are interested in csm-hf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

davidbrowne17 / csm-streaming
View on GitHub
Realtime demo, Streaming and Finetuning code for CSM
☆454Sep 17, 2025Updated 9 months ago
Cross-Product-Labs / csm_finetune
View on GitHub
Finetune Sesame's CSM 1B model, for fun and profit
☆17Mar 24, 2025Updated last year
ruapotato / csm-buddy
View on GitHub
Playing with CSM
☆22Mar 14, 2025Updated last year
nytopop / illu
View on GitHub
realtime conversational dynamics
☆19Mar 19, 2025Updated last year
primepake / dac_vae
View on GitHub
Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder
☆38Aug 30, 2025Updated 10 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
naver-ai / RapFlow-TTS
View on GitHub
☆55Jul 16, 2025Updated 11 months ago
zenforic / csm-multi
View on GitHub
Adding a multi-text multi-speaker script (diffe) that is based on a script from asiff00 on issue 61 for Sesame: A Conversational Speech G…
☆26Mar 28, 2025Updated last year
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆53May 1, 2025Updated last year
mahimairaja / awesome-csm-1b
View on GitHub
List of curated use cases built using Sesame's CSM 1B
☆74May 29, 2025Updated last year
PkmX / orpheus-chat-webui
View on GitHub
Orpheus Chat WebUI
☆76Mar 27, 2025Updated last year
ogunlao / glowtts_stdp
View on GitHub
Glow-TTS with Stochastic Duration Predictor and Stochastic Pitch Predictor
☆19Jun 5, 2023Updated 3 years ago
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 7 months ago
Audio-Foundation-Models / ConversationTTS
View on GitHub
☆101Jan 19, 2026Updated 5 months ago
davidbrowne17 / Mimi-Voice
View on GitHub
Create Unmute voice embeddings
☆26Nov 15, 2025Updated 7 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
shuheikatoinfo / UtterTune
View on GitHub
LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme…
☆26Aug 14, 2025Updated 10 months ago
ashi-ta / speechGLUE
View on GitHub
SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.
☆13Jun 2, 2023Updated 3 years ago
walker-hyf / FCTalker
View on GitHub
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)
☆26Feb 22, 2024Updated 2 years ago
speechio / asr-noises
View on GitHub
A handy dataset of noises for ASR
☆22May 29, 2019Updated 7 years ago
ryota-komatsu / speech_resynth
View on GitHub
Speech Resynthesis and Language Modeling
☆27Jun 11, 2025Updated last year
stlohrey / dia-finetuning
View on GitHub
A TTS model capable of generating ultra-realistic dialogue in one pass.
☆131Jul 25, 2025Updated 11 months ago
RobertAgee / dia
View on GitHub
A TTS model capable of generating ultra-realistic dialogue in one pass.
☆16Jun 28, 2025Updated last year
xi-j / Style-Talker
View on GitHub
An official implementation of Style-Talker for Spoken Dialogue Generation
☆23Jan 12, 2025Updated last year
Jackson-Kang / Prosody-augmentation-for-Text-to-speech
View on GitHub
Simple tool for speech dataset augmentation for modeling various prosodies.
☆14Jan 14, 2021Updated 5 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated 11 months ago
adelacvg / diff-vits
View on GitHub
☆39Oct 1, 2023Updated 2 years ago
mushanshanshan / ESLTTS
View on GitHub
ESLTTS dataset
☆16Feb 6, 2025Updated last year
channel-io / ch-tts-llasa-rl-grpo
View on GitHub
☆50Apr 20, 2026Updated 2 months ago
duerig / StyleTTS2
View on GitHub
StyleTTS 2 Optimized Training Fork
☆32Feb 2, 2025Updated last year
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
EndlessReform / smoltts
View on GitHub
Open TTS models, built for streaming on the edge
☆45Mar 16, 2025Updated last year
FrontierLabs / F5R-TTS
View on GitHub
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
☆169Mar 3, 2026Updated 4 months ago
huutuongtu / Lightvoc
View on GitHub
LIGHTVOC AN UPSAMPLING-FREE GAN VOCODER BASED ON CONFORMER AND INVERSE SHORT-TIME FOURIER TRANSFORM
☆18May 17, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
videosdk-live / NAMO-Turn-Detector-v1
View on GitHub
High-performance, semantic turn detection for conversational AI
☆43Oct 1, 2025Updated 9 months ago
msalhab96 / MultiSpeech
View on GitHub
pytorch implementation for MultiSpeech: Multi-Speaker Text to Speech with Transformer paper
☆21Jun 23, 2022Updated 4 years ago
lifeiteng / TTS-TextAnalyzer
View on GitHub
TTS Text Analyzer
☆31Jul 20, 2023Updated 2 years ago
microsoft / Distill-MOS
View on GitHub
Distillation of Self-Supervised Representation-Based Speech Quality Assessment
☆49May 15, 2025Updated last year
yangdongchao / RSTnet
View on GitHub
Real-time Speech-Text Foundation Model Toolkit (wip)
☆257Mar 26, 2025Updated last year
Thrasher-Software / sigil
View on GitHub
A local-first LLM development studio. Build, test, and customize inference workflows with your own models — no cloud, totally local.
☆17May 21, 2025Updated last year
manmay-nakhashi / TTSizer
View on GitHub
🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨
☆18May 20, 2025Updated last year