SWivid / F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
☆9,735Updated this week
Alternatives and similar repositories for F5-TTS:
Users that are interested in F5-TTS are comparing it to the libraries listed below
- SOTA Open Source TTS☆19,362Updated this week
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆10,875Updated this week
- Inference and training library for high-quality TTS models.☆5,025Updated 2 months ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆5,607Updated last month
- Multilingual Voice Understanding Model☆4,551Updated last month
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,465Updated 6 months ago
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆30,988Updated last month
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆4,167Updated 5 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,128Updated 7 months ago
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆7,396Updated 2 weeks ago
- Taming Stable Diffusion for Lip Sync!☆2,583Updated last month
- Foundational model for human-like, expressive TTS☆4,035Updated 6 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,120Updated 2 months ago
- Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…☆8,544Updated 2 weeks ago
- Enjoy the magic of Diffusion models!☆6,842Updated this week
- MARS5 speech model (TTS) from CAMB.AI☆2,620Updated 6 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆8,303Updated this week
- TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5,…☆2,016Updated this week
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,188Updated 4 months ago
- Faster Whisper transcription with CTranslate2☆14,234Updated last month
- MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting☆3,494Updated 2 months ago
- HunyuanVideo: A Systematic Framework For Large Video Generation Model☆8,621Updated this week
- Bring portraits to life!☆14,091Updated last week
- WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)☆13,990Updated this week
- Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voic…☆3,297Updated 2 weeks ago
- Code of Pyramidal Flow Matching for Efficient Video Generative Modeling☆2,784Updated 2 months ago
- [SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation☆5,636Updated 5 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆7,521Updated this week
- Converts text to speech in realtime☆2,554Updated this week
- ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable,…☆8,658Updated this week