jasonppy / VoiceCraftLinks
Zero-Shot Speech Editing and Text-to-Speech in the Wild
☆8,270Updated 2 months ago
Alternatives and similar repositories for VoiceCraft
Users that are interested in VoiceCraft are comparing it to the libraries listed below
Sorting:
- Inference and training library for high-quality TTS models.☆5,261Updated 5 months ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,099Updated 5 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,750Updated 9 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,257Updated last month
- Foundational model for human-like, expressive TTS☆4,129Updated 10 months ago
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆32,401Updated last month
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,319Updated last week
- Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…☆9,097Updated this week
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆4,947Updated 10 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆12,013Updated last week
- tiny vision language model☆8,019Updated last week
- first base model for full-duplex conversational audio☆1,746Updated 4 months ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,864Updated 4 months ago
- Generative models for conditional audio generation☆3,268Updated last week
- 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production☆40,319Updated 9 months ago
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,332Updated 4 months ago
- MARS5 speech model (TTS) from CAMB.AI☆2,760Updated 9 months ago
- Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Llama-3, Langchain, OpenAI, Upstash, Brave & Serper☆4,896Updated 8 months ago
- ☆1,128Updated 3 months ago
- A single Gradio + React WebUI with extensions for ACE-Step, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice,…☆2,192Updated 2 weeks ago
- Official Code for Stable Cascade☆6,594Updated 10 months ago
- Code and dataset for photorealistic Codec Avatars driven from audio☆2,801Updated 8 months ago
- SOTA Open Source TTS☆21,227Updated this week
- A fast multimodal LLM for real-time voice☆3,968Updated 3 months ago
- AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of adv…☆1,825Updated last month
- Official implementation of AnimateDiff.☆11,414Updated 10 months ago
- Controllable and fast Text-to-Speech for over 7000 languages!☆1,597Updated last week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,928Updated last week
- LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a ch…☆5,914Updated last month
- Jan is an open source alternative to ChatGPT that runs 100% offline on your computer☆29,198Updated this week