kszpxxzmc / ViSAudioLinks
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
☆48Updated last week
Alternatives and similar repositories for ViSAudio
Users that are interested in ViSAudio are comparing it to the libraries listed below
Sorting:
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated last year
- ☆28Updated 8 months ago
- Music production for silent film clips.☆29Updated 7 months ago
- ☆78Updated 7 months ago
- BeltOut: An open source pitch-perfect voice-to-voice timbre transfer model based on ChatterboxVC☆79Updated 4 months ago
- The official code repository for SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Tran…☆121Updated 2 weeks ago
- An official implementation of SwapAnyone.☆71Updated 8 months ago
- MTVCraft: An Open Veo3-style Audio-Video Generation Demo☆87Updated 2 months ago
- JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment☆113Updated 4 months ago
- Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation☆61Updated 5 months ago
- AudioStory: Generating Long-Form Narrative Audio with Large Language Models☆291Updated 2 months ago
- Official implementation of Progressive Detail Injection for Training-Free Semantic Binding in Text-to-Image Generation☆32Updated 4 months ago
- ☆106Updated 3 months ago
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆119Updated 3 months ago
- ☆58Updated last week
- [ICCV 2025] Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning☆208Updated last month
- ☆46Updated 3 weeks ago
- ☆34Updated last week
- ☆17Updated 10 months ago
- Krea Realtime 14B. An open-source realtime AI video model.☆415Updated 3 weeks ago
- ☆65Updated this week
- [CVPR 2025 GMCV] Test-Time Frequency Scaling: Instant Frequency Control for Any Diffusion Model☆55Updated 6 months ago
- ☆80Updated 9 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆85Updated 2 months ago
- ☆227Updated 4 months ago
- This is the official implementation of "T-LoRA: Single Image Diffusion Model Customization Without Overfitting"☆125Updated 5 months ago
- [AAAI 2026] UltraGen☆69Updated last month
- Unlimited-length talking video generation that supports image-to-video and video-to-video generation☆48Updated 3 months ago
- Official implementation for "Story2Board: A Training‑Free Approach for Expressive Storyboard Generation"☆180Updated 3 months ago
- LIA-X: Interpretable Latent Portrait Animator☆91Updated 2 months ago