open-mmlab / Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆7,647Updated this week
Related projects ⓘ
Alternatives and complementary repositories for Amphion
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆7,645Updated 4 months ago
- Inference and training library for high-quality TTS models.☆4,658Updated 3 weeks ago
- ☆6,781Updated 2 weeks ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆4,827Updated 3 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆7,251Updated this week
- Foundational model for human-like, expressive TTS☆3,895Updated 3 months ago
- Brand new TTS solution☆14,572Updated this week
- An Open Source text-to-speech system built by inverting Whisper.☆3,982Updated 5 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆4,962Updated 3 months ago
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆6,652Updated 3 months ago
- Instant voice cloning by MIT and MyShell.☆29,839Updated 2 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆3,540Updated 2 weeks ago
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆4,652Updated 4 months ago
- tiny vision language model☆5,760Updated this week
- Generative models for conditional audio generation☆2,724Updated 2 weeks ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆14,240Updated this week
- Text-to-Audio/Music Generation☆2,306Updated last month
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆6,334Updated this week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,571Updated last week
- GLM-4-Voice | 端到端中英语音对话模型☆2,289Updated last week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆7,448Updated 3 months ago
- Real time interactive streaming digital human☆3,940Updated this week
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,092Updated 2 weeks ago
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆5,955Updated last month
- Code and dataset for photorealistic Codec Avatars driven from audio☆2,710Updated 2 months ago
- Multilingual Voice Understanding Model☆3,450Updated last month
- Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions☆7,503Updated 3 months ago
- InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥☆11,121Updated 4 months ago
- first base model for full-duplex conversational audio☆1,560Updated last week
- Various AI scripts. Mostly Stable Diffusion stuff.☆3,408Updated 3 weeks ago