Daria8976 / MMADLinks

We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enhancing the model's multimodal representation learning through modality encoders and alignment.

☆15

Alternatives and similar repositories for MMAD

Users that are interested in MMAD are comparing it to the libraries listed below

Sorting:

JingyuanYY / EmoGen
This is the official implementation of 2024 CVPR paper "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models".
☆87Updated 8 months ago
yeliudev / R2-Tuning
🌀 R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)
☆88Updated last year
yuezih / Movie101
Narrative movie understanding benchmark
☆77Updated 3 months ago
haoningwu3639 / StoryGen
[CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
☆255Updated 9 months ago
EasonXiao-888 / UVCOM
[CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
☆106Updated last year
ShareGPT4Omni / ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
☆234Updated last year
steven640pixel / GalleryGPT
☆43Updated 10 months ago
yongliang-wu / NumPro
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
☆121Updated 3 weeks ago
hlchen23 / VERIFIED
Official repository of NeurIPS D&B Track 2024 paper "VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understan…
☆36Updated 8 months ago
ali-vilab / CAPability
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
☆23Updated 4 months ago
SCZwangxiao / video-FlexReduc
Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
☆82Updated 5 months ago
MCG-NJU / VideoChat-Online
[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online
☆66Updated last month
md-mohaiminul / VideoRecap
☆192Updated last year
IVGSZ / Flash-VStream
This is the official implementation of ICCV 2025 "Flash-VStream: Efficient Real-Time Understanding for Long Video Streams"
☆235Updated 2 months ago
hzphzp / WeGen
☆26Updated 5 months ago
HumanMLLM / ViSpeak
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
☆40Updated 2 months ago
gyxxyg / TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆124Updated last month
yuanc3 / DATE
Use 2 lines to empower absolute time awareness for Qwen2.5VL's MRoPE
☆23Updated last week
TencentARC / SmartEdit
Official code of SmartEdit [CVPR-2024 Highlight]
☆357Updated last year
JD-GenX / CAIG
[WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"
☆56Updated last month
CSU-JPG / TextAtlas
A Large-scale Dataset for training and evaluating model's ability on Dense Text Image Generation
☆79Updated last month
huangb23 / VTimeLLM
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
☆287Updated last year
Osilly / Interleaving-Reasoning-Generation
This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark perform…
☆58Updated last week
TencentARC / TokLIP
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
☆217Updated last month
mutonix / Vript
☆156Updated 8 months ago
WHB139426 / Grounded-Video-LLM
[EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆125Updated last month
Francis-Rings / MotionEditor
[CVPR2024] MotionEditor is the first diffusion-based model capable of video motion editing.
☆178Updated 2 weeks ago
SCZwangxiao / video-ReTaKe
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆36Updated 6 months ago
TencentARC / ARC-Hunyuan-Video-7B
Structured Video Comprehension of Real-World Shorts
☆201Updated this week
hyc2026 / StoryTeller
☆78Updated 6 months ago