multimodal-art-projection / AutoMVLinks
☆72Updated last week
Alternatives and similar repositories for AutoMV
Users that are interested in AutoMV are comparing it to the libraries listed below
Sorting:
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated last year
- An official implementation of SwapAnyone.☆72Updated 9 months ago
- Music production for silent film clips.☆31Updated 8 months ago
- ☆132Updated 6 months ago
- ☆78Updated 8 months ago
- Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation☆62Updated 6 months ago
- ☆91Updated 4 months ago
- Official Repo for MoCha Towards Movie-Grade Talking Character Synthesis☆60Updated 2 weeks ago
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆29Updated 3 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆50Updated 10 months ago
- ☆145Updated 5 months ago
- SoulX-FlashTalk is the first 14B model to achieve a sub-second start-up latency (0.87s) while sustaining a real-time throughput of 32 FPS☆72Updated last week
- ☆33Updated 2 months ago
- TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation☆67Updated last year
- The official code repository for SongPrep: A Preprocessing Framework and End-to-end Model for Full-song Structure Parsing and Lyrics Tran…☆131Updated last month
- ☆71Updated last month
- ☆20Updated last year
- Official implementation of "VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis"☆20Updated 11 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆39Updated last year
- Official implementation of MagicFace: Training-free Universal-Style Human Image Customized Synthesis.☆65Updated last year
- Glance: Accelerating Diffusion Models with 1 Sample☆147Updated 2 weeks ago
- ☆18Updated 6 months ago
- ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation☆109Updated last month
- Pytorch implementation of MIMO, Controllable Character Video Synthesis with Spatial Decomposed Modeling, from Alibaba Intelligence Group☆136Updated last year
- Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models".☆28Updated last week
- [AAAI 2026] Minute-Long Videos with Dual Parallelisms☆43Updated last month
- Blending Custom Photos with Video Diffusion Transformers☆48Updated 11 months ago
- ☆11Updated last year
- ☆81Updated 10 months ago
- DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging☆45Updated 8 months ago