Daria8976 / MMADLinks
We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enhancing the model's multimodal representation learning through modality encoders and alignment.
β16Updated last year
Alternatives and similar repositories for MMAD
Users that are interested in MMAD are comparing it to the libraries listed below
Sorting:
- π R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)β90Updated last year
- Narrative movie understanding benchmarkβ76Updated 7 months ago
- Structured Video Comprehension of Real-World Shortsβ230Updated 4 months ago
- This is the official implementation of 2024 CVPR paper "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models".β92Updated 3 months ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.β85Updated 9 months ago
- β48Updated last year
- [CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detectionβ114Updated last year
- TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMsβ101Updated last week
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captionsβ248Updated last year
- β82Updated 10 months ago
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Attenβ¦β64Updated 7 months ago
- β27Updated 9 months ago
- [CVPR 2025] Online Video Understanding: OVBench and VideoChat-Onlineβ88Updated 4 months ago
- Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"β160Updated last year
- [EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Modelsβ139Updated 5 months ago
- [CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Modelsβ262Updated last year
- UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generationβ121Updated last month
- [WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"β62Updated 6 months ago
- [NeurIPS 2025 D&Bπ₯] ImgEdit: A Unified Image Editing Dataset and Benchmarkβ275Updated 3 months ago
- LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Callingβ187Updated 2 weeks ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modelingβ143Updated 5 months ago
- What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughnessβ26Updated 8 months ago
- Offline implementation of UniREditBench: A Unified Reasoning-based Image Editing Benchmark.β52Updated last month
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmarkβ138Updated 8 months ago
- β203Updated last year
- β62Updated 6 months ago
- Video Generation Benchmarkβ68Updated 8 months ago
- [ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchβ¦β86Updated 2 weeks ago
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasingββ58Updated 2 weeks ago
- [CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generationβ85Updated 10 months ago