stepfun-ai / Step-Audio-EditXLinks
☆119Updated this week
Alternatives and similar repositories for Step-Audio-EditX
Users that are interested in Step-Audio-EditX are comparing it to the libraries listed below
Sorting:
- The official code repository for SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Tran…☆111Updated 2 weeks ago
- AudioStory: Generating Long-Form Narrative Audio with Large Language Models☆283Updated last month
- Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation☆305Updated 2 weeks ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆278Updated last week
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated last year
- JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment☆111Updated 3 months ago
- The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement☆628Updated last week
- ☆78Updated 6 months ago
- Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation☆61Updated 4 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆178Updated last year
- High-quality Text-to-Audio Generation with Efficient Diffusion Transformer☆314Updated last month
- LLIA - Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models☆136Updated 5 months ago
- ☆466Updated 5 months ago
- ☆287Updated 3 months ago
- KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution☆369Updated 3 months ago
- ☆324Updated 7 months ago
- MiMo-Audio: Audio Language Models are Few-Shot Learners☆836Updated last month
- ☆466Updated 6 months ago
- DICE-Talk is a diffusion-based emotional talking head generation method that can generate vivid and diverse emotions for speaking portrai…☆273Updated 3 months ago
- Text-audio foundation model from Boson AI☆108Updated 2 months ago
- Unofficial WIP LoRa Finetuning repository for VibeVoice☆245Updated last month
- ☆62Updated 4 months ago
- [CVPR-2025] The official code of HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation☆308Updated 5 months ago
- project for skyreels-a3☆76Updated 3 months ago
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆302Updated 2 weeks ago
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆532Updated last week
- The official GitHub Page for MiniMax☆58Updated last week
- This is the official repo for the paper "LongCat-Flash-Omni Technical Report"☆373Updated last week
- Lynx: Towards High-Fidelity Personalized Video Generation☆282Updated last month
- ☆526Updated last month