open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,424Updated 4 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,826Updated 9 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆16,702Updated last week
- Inference and training library for high-quality TTS models.☆5,435Updated 9 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆13,346Updated 3 weeks ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,989Updated last year
- Multilingual Voice Understanding Model☆6,700Updated last month
- SOTA Open Source TTS☆23,076Updated this week
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,287Updated 5 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,455Updated last month
- Foundational model for human-like, expressive TTS☆4,166Updated last year
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆34,581Updated 5 months ago
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆7,682Updated this week
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,398Updated 6 months ago
- GLM-4-Voice | 端到端中英语音对话模型☆3,055Updated 10 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆12,894Updated last week
- An Open Source text-to-speech system built by inverting Whisper.☆4,472Updated 4 months ago
- ☆4,533Updated 3 months ago
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆7,140Updated last year
- AI powered speech denoising and enhancement☆1,991Updated 10 months ago
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,340Updated last year
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆9,163Updated last month
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,923Updated last year
- A fast multimodal LLM for real-time voice☆4,211Updated last month
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,960Updated last week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆3,073Updated 4 months ago
- Generative models for conditional audio generation☆3,451Updated 2 months ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,897Updated last month
- Controllable and fast Text-to-Speech for over 7000 languages!☆1,646Updated 3 months ago
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆5,003Updated last year
- 官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project☆1,790Updated last year