knottwill/sesame-finetune

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/knottwill/sesame-finetune)

knottwill / sesame-finetune

Finetune Sesame AI's conversational speech model on new languages and voices. Blog post: https://blog.speechmatics.com/sesame-finetune

☆113

Alternatives and similar repositories for sesame-finetune

Users that are interested in sesame-finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

davidbrowne17 / csm-streaming
View on GitHub
Realtime demo, Streaming and Finetuning code for CSM
☆456Sep 17, 2025Updated 10 months ago
Cross-Product-Labs / csm_finetune
View on GitHub
Finetune Sesame's CSM 1B model, for fun and profit
☆17Mar 24, 2025Updated last year
Shy-98 / MELLE
View on GitHub
Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"
☆41Jun 28, 2025Updated last year
xinshengwang / robpitch
View on GitHub
A pitch detection model trained to be robust against noise and reverberation environments.
☆27Jan 21, 2025Updated last year
stlohrey / dia-finetuning
View on GitHub
A TTS model capable of generating ultra-realistic dialogue in one pass.
☆131Jul 25, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
vivian556123 / NeurIPS2024-CoVoMix
View on GitHub
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
☆67Jan 16, 2025Updated last year
yangdongchao / ALMTokenizer2
View on GitHub
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆45Sep 5, 2025Updated 10 months ago
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
SparkAudio / SparkVox
View on GitHub
☆37Jun 9, 2025Updated last year
kyutai-labs / moshi-finetune
View on GitHub
☆475Oct 3, 2025Updated 9 months ago
ASLP-lab / LLaSA_Plus
View on GitHub
Llasa Speed Up
☆64Jan 18, 2026Updated 6 months ago
davidbrowne17 / Mimi-Voice
View on GitHub
Create Unmute voice embeddings
☆26Nov 15, 2025Updated 8 months ago
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
ReisCook / VoiceAssistant
View on GitHub
A functioning Sesame CSM project with a desktop GUI - Real-time factor: 0.6x with 4070 Ti Super - Requires only 8GB VRAM
☆81May 19, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
jingzhunxue / FlowMirror_HydraVox
View on GitHub
FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens…
☆49Feb 17, 2026Updated 5 months ago
davidbrowne17 / csm-streaming-tf
View on GitHub
A transformers implementation of csm-streaming
☆30May 16, 2025Updated last year
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆125Jun 4, 2025Updated last year
lmxue / NVV-SuperBench
View on GitHub
NVV-SuperBench: Beyond Words, Beyond Quality—Benchmarking Nonverbal Vocalizations in Speech Generation (Interspeech 2026 long paper)
☆18Jun 21, 2026Updated last month
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
alibaba / vstyle
View on GitHub
☆34Sep 15, 2025Updated 10 months ago
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 8 months ago
LAION-AI / scaled-echo-tts
View on GitHub
Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024…
☆24Mar 29, 2026Updated 4 months ago
yangdongchao / ALMTokenizer
View on GitHub
The demo page for ALMTokenizer
☆59Apr 14, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
thomasgauthier / csm-hf
View on GitHub
Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers
☆58May 17, 2025Updated last year
ZhikangNiu / Semantic-VAE
View on GitHub
[INTERSPEECH 2026 Oral]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
☆121Jun 21, 2026Updated last month
bigai-nlco / UltraVoice
View on GitHub
Official Repository of UltraVoice
☆63Oct 28, 2025Updated 9 months ago
xiaomi-research / tts-prism
View on GitHub
☆47Apr 27, 2026Updated 3 months ago
xiaomi-research / dasheng-tokenizer
View on GitHub
State-of-the-art continious audio tokenization
☆40Mar 9, 2026Updated 4 months ago
SparkAudio / VoxBox
View on GitHub
A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.
☆115May 5, 2025Updated last year
y-ren16 / OV-InstructTTS
View on GitHub
☆22Jan 27, 2026Updated 6 months ago
multimodal-art-projection / Open-Suno
View on GitHub
trying to reproduce suno v3
☆35Jan 29, 2025Updated last year
isaiahbjork / csm-voice-cloning
View on GitHub
Sesame CSM 1B Voice Cloning
☆339Mar 15, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
AmphionTeam / Emilia-NV
View on GitHub
Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"
☆92Sep 18, 2025Updated 10 months ago
HeCheng0625 / Diffusion-Speech-Tokenizer
View on GitHub
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆198Jan 25, 2026Updated 6 months ago
yanghaha0908 / WavCube
View on GitHub
Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"
☆62Jun 27, 2026Updated last month
nytopop / csm
View on GitHub
A Conversational Speech Generation Model
☆14Mar 16, 2025Updated last year
wetdog / wavenext_pytorch
View on GitHub
Unofficial implementation of wavenext vocoder
☆59Aug 28, 2024Updated last year
thu-spmi / CTC-TTS
View on GitHub
Code for CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment, Interspeech 2026.
☆20Jun 9, 2026Updated last month
hhguo / SoCodec
View on GitHub
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
☆92Dec 20, 2024Updated last year