open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,505Updated 6 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,965Updated 11 months ago
- Inference and training library for high-quality TTS models.☆5,484Updated 11 months ago
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,377Updated last year
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,435Updated 7 months ago
- SOTA Open Source TTS☆24,106Updated 3 weeks ago
- Foundational model for human-like, expressive TTS☆4,198Updated last year
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,985Updated 10 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,533Updated 5 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,659Updated 3 months ago
- Multilingual Voice Understanding Model☆6,993Updated 3 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,442Updated 8 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,063Updated last year
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆17,322Updated last month
- Text-to-Audio/Music Generation☆2,520Updated last year
- GLM-4-Voice | 端到端中英语音对话模型☆3,085Updated 11 months ago
- A fast multimodal LLM for real-time voice☆4,267Updated 2 months ago
- ☆4,560Updated 5 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆3,098Updated 6 months ago
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆9,016Updated this week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆13,656Updated 2 weeks ago
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆35,478Updated 7 months ago
- AI powered speech denoising and enhancement☆2,060Updated 11 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,101Updated last week
- Generative models for conditional audio generation☆3,507Updated last month
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆13,540Updated last month
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,438Updated last year
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆9,410Updated 2 months ago
- Controllable and fast Text-to-Speech for over 7000 languages!☆1,660Updated 4 months ago
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,713Updated last year
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆1,936Updated 7 months ago