open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,661Updated 7 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- Inference and training library for high-quality TTS models.☆5,508Updated last year
- SOTA Open Source TTS☆24,650Updated 2 weeks ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆7,148Updated last year
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,413Updated last year
- Foundational model for human-like, expressive TTS☆4,190Updated last year
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,133Updated last year
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆35,819Updated 9 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆19,224Updated last week
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,455Updated 10 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,333Updated last week
- Generative models for conditional audio generation☆3,575Updated 3 weeks ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,027Updated last year
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆9,859Updated last month
- Multilingual Voice Understanding Model☆7,399Updated 3 weeks ago
- A fast multimodal LLM for real-time voice☆4,328Updated last month
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,736Updated last year
- An Open Source text-to-speech system built by inverting Whisper.☆4,549Updated last month
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,960Updated last year
- MARS5 speech model (TTS) from CAMB.AI☆2,813Updated last year
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,548Updated 9 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,856Updated 5 months ago
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆14,597Updated 2 weeks ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆3,114Updated 8 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆14,001Updated this week
- AI powered speech denoising and enhancement☆2,161Updated last year
- Text-to-Audio/Music Generation☆2,565Updated last year
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆5,024Updated last year
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆7,203Updated last year
- Fast and accurate automatic speech recognition (ASR) for edge devices☆3,096Updated 2 months ago
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆5,296Updated 6 months ago