nytopop / csm
A Conversational Speech Generation Model
☆11Updated last month
Alternatives and similar repositories for csm:
Users that are interested in csm are comparing it to the libraries listed below
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆62Updated 2 weeks ago
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆45Updated 2 weeks ago
- ☆21Updated 2 weeks ago
- ☆203Updated 3 weeks ago
- Sesame Converse - Real Time Conversations - Powered by Gemma 3☆61Updated last month
- Open TTS models, built for streaming on the edge☆39Updated last month
- Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX☆27Updated 6 months ago
- VoiceStar: Robust, Duration-controllable TTS that can Extrapolate☆104Updated 2 weeks ago
- Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.☆35Updated this week
- a Frontier Japanese Speech Generation net☆31Updated last month
- This public GitHub repository contains code for a fully self-hosted, on-premise transcription solution.☆53Updated 4 months ago
- Joint speech-language model - respond directly to audio!☆30Updated 11 months ago
- StyleTTS 2 Optimized Training Fork☆27Updated 2 months ago
- Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in MLX☆20Updated 6 months ago
- Streaming and Finetuning code for CSM☆239Updated last week
- Convert your PDFs and EPUBs into audiobooks effortlessly. Features intelligent text extraction, customizable text-to-speech settings, and…☆65Updated 3 weeks ago
- a simple system for 2-way interruptible voice interactions between human and LLM☆28Updated last year
- A cutting-edge Cascading voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.☆58Updated 2 weeks ago
- Hanasu is a human-like TTS model based on the multilingual Himitsu V1 transformer-based encoder and VITS architecture☆26Updated 2 weeks ago
- Collection of Open Source Speech Data☆153Updated 5 months ago
- ☆62Updated 9 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆95Updated 6 months ago
- G2P☆218Updated last week
- realtime conversational dynamics☆18Updated last month
- A simple, hackable text-to-speech system in PyTorch and MLX☆153Updated 2 months ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆11Updated last year
- Speaker Diarization with Transformers☆64Updated 11 months ago
- Video+code lecture on building nanoGPT from scratch☆65Updated 10 months ago
- Audio tokenization, in the fastest way possible!☆51Updated 8 months ago
- Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation☆405Updated this week