open-mmlab / Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆8,544Updated 2 weeks ago
Alternatives and similar repositories for Amphion:
Users that are interested in Amphion are comparing it to the libraries listed below
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆5,607Updated last month
- Inference and training library for high-quality TTS models.☆5,025Updated 2 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,120Updated 2 months ago
- SOTA Open Source TTS☆19,362Updated this week
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆7,396Updated 2 weeks ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,128Updated 7 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,465Updated 6 months ago
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆7,676Updated 6 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆7,521Updated this week
- Foundational model for human-like, expressive TTS☆4,035Updated 6 months ago
- Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support e…☆4,913Updated this week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆9,735Updated this week
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆10,875Updated this week
- Multilingual Voice Understanding Model☆4,551Updated last month
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆8,303Updated this week
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,157Updated 3 months ago
- GLM-4-Voice | 端到端中英语音对话模型☆2,669Updated 2 months ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,745Updated last month
- Open source real-time translation app for Android that runs locally☆7,366Updated last month
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,188Updated 4 months ago
- V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.☆2,307Updated 3 weeks ago
- MARS5 speech model (TTS) from CAMB.AI☆2,620Updated 6 months ago
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆4,167Updated 5 months ago
- Your image is almost there!☆7,498Updated 6 months ago
- WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)☆13,990Updated this week
- Generative models for conditional audio generation☆2,901Updated this week
- ☆1,154Updated 8 months ago
- Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voic…☆3,297Updated 2 weeks ago
- AI powered speech denoising and enhancement☆1,641Updated 2 months ago
- Converts text to speech in realtime☆2,554Updated this week