Daria8976 / MMADLinks
We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enhancing the model's multimodal representation learning through modality encoders and alignment.
☆16Updated last year
Alternatives and similar repositories for MMAD
Users that are interested in MMAD are comparing it to the libraries listed below
Sorting:
- This is the official implementation of 2024 CVPR paper "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models".☆91Updated 2 months ago
- [CVPR2024] MotionEditor is the first diffusion-based model capable of video motion editing.☆186Updated 3 months ago
- [CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models☆262Updated last year
- Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"☆158Updated last year
- Narrative movie understanding benchmark☆76Updated 7 months ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆84Updated 8 months ago
- 🌀 R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)☆90Updated last year
- [WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"☆61Updated 5 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆63Updated 6 months ago
- [ICCV 2025] CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation☆123Updated 5 months ago
- [ICLR 2025] VideoGrain: This repo is the official implementation of "VideoGrain: Modulating Space-Time Attention for Multi-Grained Video …☆159Updated 9 months ago
- ☆48Updated last year
- LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer☆49Updated this week
- The HD-VG-130M Dataset☆120Updated last year
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆247Updated last year
- ☆50Updated 6 months ago
- 【CVPR 2025 Oral】Official Repo for Paper "AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea"☆209Updated 9 months ago
- Official code of SmartEdit [CVPR-2024 Highlight]☆369Updated last year
- ☆158Updated 11 months ago
- [CVPR 2024] Official PyTorch implementation of FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition☆171Updated 4 months ago
- [CVPR 2025] T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆103Updated 2 months ago
- MotionSight's official code implementation.☆44Updated 3 months ago
- UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing☆115Updated 8 months ago
- ☆201Updated last year
- ☆27Updated 8 months ago
- [NeurIPS 2025 D&B🔥] ImgEdit: A Unified Image Editing Dataset and Benchmark☆263Updated 2 months ago
- [ICLR 2025] Official code implementation of DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation☆129Updated 10 months ago
- (CVPR 2024) Official code for paper "Towards Language-Driven Video Inpainting via Multimodal Large Language Models"☆99Updated last year
- This is a collection of recent papers on reasoning in video generation models.☆91Updated last week
- [NIPS 25'] Evaluation code of paper "KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models"☆36Updated 2 months ago