open-mmlab / AmphionLinks
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
☆9,192Updated last month
Alternatives and similar repositories for Amphion
Users that are interested in Amphion are comparing it to the libraries listed below
Sorting:
- Inference and training library for high-quality TTS models.☆5,314Updated 6 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆8,484Updated this week
- Instant voice cloning by MIT and MyShell. Audio foundation model.☆32,733Updated 2 months ago
- High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.☆6,198Updated 6 months ago
- Zero-Shot Speech Editing and Text-to-Speech in the Wild☆8,304Updated 3 months ago
- StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models☆5,817Updated 10 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆12,466Updated this week
- SOTA Open Source TTS☆22,039Updated 2 weeks ago
- Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with…☆3,725Updated last month
- Foundational model for human-like, expressive TTS☆4,136Updated 11 months ago
- An Open Source text-to-speech system built by inverting Whisper.☆4,288Updated 3 weeks ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆3,893Updated 5 months ago
- A fast multimodal LLM for real-time voice☆4,060Updated 4 months ago
- Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.☆14,798Updated this week
- EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine☆8,058Updated 10 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,942Updated last month
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆11,236Updated this week
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,681Updated 2 months ago
- Foundational Models for State-of-the-Art Speech and Text Translation☆11,565Updated 7 months ago
- An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/☆7,888Updated last year
- Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.☆4,708Updated 3 months ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆17,709Updated this week
- Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>☆4,692Updated 3 months ago
- [SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild☆7,092Updated 10 months ago
- PhotoMaker [CVPR 2024]☆10,000Updated 8 months ago
- Generative models for conditional audio generation☆3,341Updated 3 weeks ago
- text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)☆11,610Updated last week
- Local realtime voice AI☆2,328Updated 3 months ago
- ☆8,486Updated last year
- ☆4,384Updated 2 weeks ago