Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query (ICCV2021)
☆20Dec 4, 2021Updated 4 years ago
Alternatives and similar repositories for Ask-Confirm
Users that are interested in Ask-Confirm are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2019] Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries☆12Apr 15, 2022Updated 3 years ago
- Learning Cross-modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)☆13Apr 7, 2021Updated 4 years ago
- The Pytorch implementation for "Video-Text Pre-training with Learned Regions"☆43Jul 15, 2022Updated 3 years ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024☆33Jun 18, 2025Updated 8 months ago
- ☆12Mar 12, 2023Updated 2 years ago
- Data release for Step Differences in Instructional Video (CVPR24)☆14Jun 19, 2024Updated last year
- Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)☆56Mar 1, 2024Updated 2 years ago
- Extracting optical flow based on GPU in Opencv3☆12Jul 29, 2019Updated 6 years ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 9 months ago
- Implementation of our AAAI2022 paper, Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching.☆36Jun 16, 2023Updated 2 years ago
- Adaptive Offline Quintuplet Loss for Image-Text Matching (AOQ)☆34Jul 2, 2020Updated 5 years ago
- [Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training☆22Mar 19, 2022Updated 3 years ago
- Visual Delta Generator with Large Multi-modal Model for Semi-supervised Composed Image Retrieval - CVPR2024☆21May 30, 2024Updated last year
- 🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)☆94Apr 28, 2021Updated 4 years ago
- [Findings of EMNLP 2022] AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant☆23Sep 11, 2023Updated 2 years ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆61Jun 12, 2023Updated 2 years ago
- Code and benchmarks for the Semantic Video Retrieval Task☆53Oct 18, 2022Updated 3 years ago
- Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)☆55Mar 5, 2023Updated 2 years ago
- Official code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval☆56Oct 8, 2021Updated 4 years ago
- Modality-Agnostic Attention Fusion for visual search with text feedback☆25Mar 21, 2023Updated 2 years ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆37Aug 18, 2024Updated last year
- 🔥 [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"☆39Nov 21, 2025Updated 3 months ago
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆34Mar 24, 2025Updated 11 months ago
- Dynamic Modality Interaction Modeling for Image-Text Retrieval. SIGIR'21☆70May 26, 2022Updated 3 years ago
- ☆30May 7, 2021Updated 4 years ago
- [CVPR 2022] Visual Abductive Reasoning☆124Oct 22, 2024Updated last year
- Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …☆61Oct 21, 2022Updated 3 years ago
- Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"☆63Apr 16, 2021Updated 4 years ago
- ☆26Jan 12, 2022Updated 4 years ago
- [CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?☆35Apr 27, 2023Updated 2 years ago
- ☆34Mar 10, 2023Updated 2 years ago
- Code for ECCV 2020 paper - LEMMA: A Multi-view Dataset for LEarning Multi-agent Multi-task Activities☆30Apr 8, 2021Updated 4 years ago
- [CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》☆151Jun 7, 2023Updated 2 years ago
- The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretr…☆443Sep 25, 2025Updated 5 months ago
- PyTorch GPU distributed training code for MIL-NCE HowTo100M☆219Jul 5, 2022Updated 3 years ago
- RS Generate dataset☆16Jan 2, 2025Updated last year
- ☆12Sep 11, 2021Updated 4 years ago
- [AAAI2024] An official pytorch implement of the paper: Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Underst…☆13Dec 8, 2024Updated last year