open-mmlab / Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆8,063Updated this week
Alternatives and similar repositories for Amphion:
Users that are interested in Amphion are comparing it to the libraries listed below
- Inference and training library for high-quality TTS models.☆4,910Updated last month
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆9,662Updated this week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆8,947Updated this week
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆5,400Updated 3 weeks ago
- SOTA Open Source TTS☆18,396Updated this week
- ☆7,156Updated this week
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,011Updated 6 months ago
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆7,594Updated 5 months ago
- Multilingual Voice Understanding Model☆4,097Updated last week
- Speech-to-text, text-to-speech, speaker diarization, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support e…☆4,298Updated this week
- Foundational model for human-like, expressive TTS☆3,979Updated 5 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆7,722Updated this week
- GLM-4-Voice | 端到端中英语音对话模型☆2,565Updated last month
- MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone☆13,445Updated this week
- Real time interactive streaming digital human☆4,326Updated 2 weeks ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,080Updated last month
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,066Updated 2 months ago
- Generative models for conditional audio generation☆2,833Updated last week
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆4,775Updated 6 months ago
- tiny vision language model☆6,732Updated this week
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,265Updated 5 months ago
- Enjoy the magic of Diffusion models!☆6,742Updated this week
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆5,719Updated last week
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,116Updated 3 months ago
- Official inference repo for FLUX.1 models☆19,466Updated last week
- Your image is almost there!☆7,468Updated 5 months ago
- On-device Speech Recognition for Apple Silicon☆4,127Updated this week
- Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI☆18,641Updated last week
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆4,020Updated 4 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,014Updated this week