open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,687Updated 8 months ago
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆7,186Updated last year
- Inference and training library for high-quality TTS models.☆5,528Updated last year
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,426Updated last year
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆6,162Updated last year
- Foundational model for human-like, expressive TTS☆4,191Updated last year
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,461Updated 10 months ago
- SOTA Open Source TTS☆24,863Updated last week
- An Open Source text-to-speech system built by inverting Whisper.☆4,555Updated last month
- Generative models for conditional audio generation☆3,599Updated 3 weeks ago
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆35,918Updated 9 months ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆3,893Updated 5 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆19,578Updated this week
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆14,048Updated last week
- zero-shot voice conversion & singing voice conversion, with real-time support☆3,575Updated 9 months ago
- Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key☆9,979Updated 2 months ago
- AI powered speech denoising and enhancement☆2,175Updated last year
- Text-to-Audio/Music Generation☆2,578Updated last year
- Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime…☆10,226Updated this week
- Multilingual Voice Understanding Model☆7,497Updated last month
- 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production☆44,516Updated last year
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation☆5,019Updated last year
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,956Updated 2 years ago
- A fast multimodal LLM for real-time voice☆4,349Updated 2 months ago
- MARS5 speech model (TTS) from CAMB.AI☆2,813Updated last year
- Accepted as [NeurIPS 2024] Spotlight Presentation Paper☆6,383Updated last year
- A single Gradio + React WebUI with extensions for ACE-Step, Kimi Audio, Piper TTS, GPT-SoVITS, CosyVoice, XTTSv2, DIA, Kokoro, OpenVoice,…☆2,955Updated 2 weeks ago
- Controllable and fast Text-to-Speech for over 7000 languages!☆2,164Updated 2 weeks ago
- open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming…☆3,524Updated last year
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,593Updated this week
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆4,416Updated this week