knottwill / sesame-finetuneLinks
Finetune Sesame AI's conversational speech model on new languages and voices. Blog post: https://blog.speechmatics.com/sesame-finetune
β97Updated 3 months ago
Alternatives and similar repositories for sesame-finetune
Users that are interested in sesame-finetune are comparing it to the libraries listed below
Sorting:
- ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨β131Updated 5 months ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"β141Updated 7 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDβ¦β196Updated 3 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.β127Updated 5 months ago
- High quality text-to-speech based on StyleTTS 2.β71Updated 3 weeks ago
- Real-time Speech-Text Foundation Model Toolkit (wip)β249Updated 9 months ago
- [NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.β182Updated last month
- VALL-E 2 reproductionβ133Updated last year
- β105Updated 3 months ago
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variabilityβ105Updated 11 months ago
- [ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"β97Updated 11 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.β113Updated last year
- Official implementation of the TTS model Lina-Speechβ175Updated last year
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.β138Updated 3 months ago
- β41Updated 11 months ago
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesisβ142Updated last year
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,β¦β80Updated last year
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vecβ¦β113Updated 4 months ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'β149Updated 9 months ago
- β44Updated last year
- β70Updated last year
- Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesisβ27Updated 9 months ago
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).β92Updated 3 months ago
- β348Updated 3 months ago
- SoTA open-source TTSβ128Updated 7 months ago
- This is the M-AILABS Speech Datasetβ98Updated this week
- An unofficial PyTorch implementation of VALL-Eβ88Updated 5 months ago
- A Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTSβ53Updated last year
- ποΈ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets β¨β17Updated 7 months ago
- Code for the blog "Neural audio codecs: how to get audio into LLMs"β144Updated 2 months ago