open-mmlab / Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆8,969Updated 2 weeks ago
Alternatives and similar repositories for Amphion:
Users that are interested in Amphion are comparing it to the libraries listed below
- Inference and training library for high-quality TTS models.☆5,212Updated 4 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆11,448Updated last week
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,251Updated last month
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆5,946Updated 4 months ago
- Foundational model for human-like, expressive TTS☆4,098Updated 8 months ago
- SOTA Open Source TTS☆20,753Updated 2 weeks ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆13,204Updated last week
- Enjoy the magic of Diffusion models!☆8,406Updated this week
- Generative models for conditional audio generation☆3,028Updated last month
- Text-to-Audio/Music Generation☆2,407Updated 6 months ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,852Updated last year
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,492Updated 5 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,656Updated 8 months ago
- Multilingual Voice Understanding Model☆5,393Updated last month
- A fast multimodal LLM for real-time voice☆3,855Updated 2 months ago
- MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting☆3,974Updated last week
- GLM-4-Voice | 端到端中英语音对话模型☆2,858Updated 4 months ago
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆7,918Updated 8 months ago
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet c…☆5,739Updated this week
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,271Updated 7 months ago
- tiny vision language model☆7,817Updated last week
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆3,989Updated last week
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆31,877Updated last week
- text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)☆11,280Updated last month
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,899Updated last week
- An Open Source text-to-speech system built by inverting Whisper.☆4,217Updated 2 weeks ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆9,922Updated this week
- MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone☆19,286Updated last month
- ☆4,213Updated last month
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,106Updated last week