FunAudioLLM / InspireMusic
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
☆1,035Updated last week
Alternatives and similar repositories for InspireMusic:
Users that are interested in InspireMusic are comparing it to the libraries listed below
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆1,410Updated last week
- TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching☆698Updated last month
- An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System☆702Updated this week
- [CVPR 2025] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis☆1,271Updated 3 weeks ago
- ☆386Updated this week
- FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝☆566Updated 8 months ago
- An Open-Sourced LLM-empowered Foundation TTS System☆663Updated 5 months ago
- zero-shot voice conversion & singing voice conversion, with real-time support☆2,162Updated 2 weeks ago
- Memory-Guided Diffusion for Expressive Talking Video Generation☆773Updated 2 months ago
- Diffusion-based Portrait and Animal Animation☆736Updated last month
- Interface for OuteTTS models.☆961Updated last month
- OpenMusic: SOTA Text-to-music (TTM) Generation☆544Updated last month
- YuE: Open Full-song Generation Foundation for the GPU Poor☆362Updated last month
- ☆214Updated 3 weeks ago
- An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…☆2,542Updated last week
- LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆492Updated 3 weeks ago
- NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms☆925Updated last week
- SkyReels V1: The first and most advanced open-source human-centric video foundation model☆1,958Updated last month
- Taming Stable Diffusion for Lip Sync!☆3,495Updated 2 weeks ago
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆261Updated last month
- Sonic is a method about ' Shifting Focus to Global Audio Perception in Portrait Animation',you can use it in comfyUI☆828Updated last month
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆358Updated this week
- A Training-free Iterative Framework for Long Story Visualization☆865Updated 2 months ago
- ☆1,240Updated 9 months ago
- Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".☆895Updated 5 months ago
- Ultimate Vocal Remover 5 with Gradio UI. Separate an audio file into various stems, using multiple models☆344Updated last week
- Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…☆867Updated 2 weeks ago
- ☆353Updated 8 months ago
- The official HelloMeme GitHub site☆587Updated last week
- ☆719Updated last month