We propose MMAD, a novel automated pipeline for precise AD generation. MMAD introduces ambient music alongside visual and linguistic, enhancing the model's multimodal representation learning through modality encoders and alignment.
☆17Dec 31, 2024Updated last year
Alternatives and similar repositories for MMAD
Users that are interested in MMAD are comparing it to the libraries listed below
Sorting:
- Soulstyler: Using Large Language Model to Guide Image Style Transfer for Target Object☆18Dec 1, 2024Updated last year
- [ACL 2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information☆15Oct 27, 2024Updated last year
- Official implementation of "EG4D: Explicit Generation of 4D Object without Score Distillation" (ICLR 2025)☆36Feb 14, 2025Updated last year
- Narrative movie understanding benchmark☆76Jun 11, 2025Updated 9 months ago
- Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail☆16Jul 5, 2024Updated last year
- The Social-IQ 2.0 Challenge Release for the Artificial Social Intelligence Workshop at ICCV '23☆36Oct 13, 2023Updated 2 years ago
- MRI preprocessing / segmentation in under 30s☆17Mar 13, 2026Updated last week
- [COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs☆52Jan 22, 2025Updated last year
- 用于同济大学体育场馆预约的插件☆19Mar 17, 2025Updated last year
- Incorporating the memory mechanism into the transformer and employing a parallel weighting structure to obtain a better utterance-level r…☆22Oct 4, 2025Updated 5 months ago
- ☆25Jun 29, 2025Updated 8 months ago
- Both audio-only and audio-visual speaker diarization datasets are listed here.☆15Feb 22, 2023Updated 3 years ago
- HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering (CVPR'23)☆14Nov 4, 2025Updated 4 months ago
- Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering☆11Feb 16, 2023Updated 3 years ago
- Vapoursynth filter using ProPainter: Improving Propagation and Transformer for Video Inpainting☆17Jan 2, 2026Updated 2 months ago
- Official PyTorch implementation of MuVieCAST: Multi-View Consistent Artistic Style Transfer.☆16Jan 22, 2025Updated last year
- ☆10Nov 27, 2024Updated last year
- Internal diffusion for video inpainting☆15May 19, 2025Updated 10 months ago
- ☆14Jan 9, 2024Updated 2 years ago
- Add Rain Streak Mask On Unparied Image Using GAN☆10Sep 12, 2020Updated 5 years ago
- Thermal Indoor Motion Dataset☆14Apr 27, 2023Updated 2 years ago
- CVPR 24 paper: Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs☆14Mar 19, 2024Updated 2 years ago
- A python package of robust and effective defogging/dehazing method☆15Dec 30, 2018Updated 7 years ago
- [NeurIPS2023] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering☆13Jan 5, 2024Updated 2 years ago
- 智慧园区