96-Zachary / vse_2ad
☆15Updated 2 years ago
Related projects: ⓘ
- Implementation of our paper, 'Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval.'☆19Updated 9 months ago
- ☆43Updated 2 years ago
- Benchmark data for "Rethinking Benchmarks for Cross-modal Image-text Retrieval" (SIGIR 2023)☆21Updated last year
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆28Updated 5 months ago
- Code for our EMNLP-2022 paper: "Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning"☆12Updated last year
- [ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval☆30Updated last year
- Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…☆10Updated 3 months ago
- Code for WACV 2023 paper "VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge"☆21Updated last year
- Recent Advances in Visual Dialog☆29Updated 2 years ago
- Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"☆22Updated 9 months ago
- ☆34Updated last year
- An automatic MLLM hallucination detection framework☆17Updated 11 months ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆64Updated 2 years ago
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆41Updated 2 months ago
- Implementation of our paper, Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination..☆13Updated 9 months ago
- Implementation of our AAAI2022 paper, Show Your Faith: Cross-Modal Confidence-Aware Network for Image-Text Matching.☆36Updated last year
- Video Graph Transformer for Video Question Answering (ECCV'22)☆44Updated last year
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆60Updated 5 months ago
- Code and data for ImageCoDe, a contextual vison-and-language benchmark☆39Updated 6 months ago
- A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval☆40Updated 2 years ago
- Implementation for the paper "Unified Multimodal Model with Unlikelihood Training for Visual Dialog"☆13Updated last year
- ☆24Updated 5 months ago
- [Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection☆25Updated 7 months ago
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆21Updated 5 months ago
- ☆6Updated last year
- Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"☆55Updated 2 years ago
- the code for paper: A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering☆11Updated last year
- Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))☆85Updated last year
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆16Updated last month
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆34Updated last month