TencentARC / AudioStoryLinks
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
☆295Updated 4 months ago
Alternatives and similar repositories for AudioStory
Users that are interested in AudioStory are comparing it to the libraries listed below
Sorting:
- ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation☆111Updated last month
- ☆77Updated 8 months ago
- [NeurIPS'25 Spotlight] Official implementation of "JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation"☆68Updated 3 weeks ago
- LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)☆371Updated 3 months ago
- 🎨 A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space☆154Updated last month
- A real-time streaming conversational video system that transforms text interactions into continuous, high-fidelity video responses using …☆289Updated last month
- OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.☆628Updated 3 months ago
- We present FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while a…☆432Updated 3 weeks ago
- The official code repository for SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Tran…☆147Updated last month
- ☆240Updated 3 weeks ago
- Official implementation for "Story2Board: A Training‑Free Approach for Expressive Storyboard Generation"☆226Updated 5 months ago
- ☆146Updated last month
- Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.☆575Updated 3 months ago
- A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics…☆836Updated this week
- Ovis-Image is a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stri…☆301Updated last month
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"☆565Updated last week
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation☆671Updated 3 months ago
- FIBO is a SOTA, first open-source, JSON-native text-to-image model built for controllable, predictable, and legally safe image generation…☆300Updated 3 weeks ago
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated last year
- Lynx: Towards High-Fidelity Personalized Video Generation☆306Updated 4 months ago
- The official GitHub Page for MiniMax☆61Updated 2 months ago
- GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.☆707Updated this week
- [SIGGRAPH Asia 25] Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off☆331Updated 3 months ago
- An official implementation of SwapAnyone.☆73Updated 10 months ago
- [ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning☆211Updated 2 months ago
- ☆227Updated 6 months ago
- Official implementation of the paper "MusicInfuser: Making Video Diffusion Listen and Dance"☆80Updated 9 months ago
- project for skyreels-a3☆78Updated 5 months ago
- Official Code Repo for UniVA: Universal Video Agents☆305Updated 2 months ago
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆299Updated 2 months ago