adxcreative / D-MLinks
The official source code of our AAAI25 paper "D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching".
☆10Updated 3 months ago
Alternatives and similar repositories for D-M
Users that are interested in D-M are comparing it to the libraries listed below
Sorting:
- ☆8Updated 6 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆72Updated 11 months ago
- ☆17Updated this week
- Official code for Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions (CVPR 2024)☆25Updated 11 months ago
- A lightweight codebase for referring expression comprehension and segmentation☆55Updated 3 years ago
- [NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation☆31Updated last year
- Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization☆14Updated last year
- Code for paper "LLMs Can Evolve Continually on Modality for X-Modal Reasoning" NeurIPS2024☆36Updated 5 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆54Updated 11 months ago
- [NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding☆50Updated last year
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆50Updated last year
- [ICCV 2023] Simple Baselines for Interactive Video Retrieval with Questions and Answers☆16Updated last year
- ☆35Updated last year
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆65Updated last year
- ☆92Updated last year
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆41Updated 7 months ago
- ☆25Updated 9 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆43Updated last year
- Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022☆97Updated 2 years ago
- ☆23Updated last month
- ☆29Updated 8 months ago
- [ECCV 22] LocVTP: Video-Text Pre-training for Temporal Localization☆39Updated 2 years ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆51Updated last year
- ICLR‘24 Offical Implementation of Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization☆72Updated last year
- ☆21Updated last year
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆58Updated 4 months ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆21Updated 4 months ago
- ☆30Updated last year
- [TPAMI 2024] This is the Pytorch code for our paper "Context Disentangling and Prototype Inheriting for Robust Visual Grounding".☆17Updated last month
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆38Updated last month