frank-chris / ImageTextRetrievalLinks
In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on im…
☆11Updated 4 years ago
Alternatives and similar repositories for ImageTextRetrieval
Users that are interested in ImageTextRetrieval are comparing it to the libraries listed below
Sorting:
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆34Updated last year
- ☆28Updated 2 years ago
- ☆24Updated 3 years ago
- ☆15Updated 3 years ago
- Using image captions with LLM for zero-shot VQA☆18Updated last year
- Repository of paper Consistency-preserving Visual Question Answering in Medical Imaging (MICCAI2022)☆25Updated 2 years ago
- ☆47Updated 3 weeks ago
- Implementation of our paper, 'Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval.'☆28Updated 2 years ago
- ☆17Updated 2 years ago
- ☆36Updated last year
- [ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval☆34Updated 2 years ago
- ☆52Updated 2 years ago
- This is the repo for "Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition", CVPR2025.☆20Updated last month
- Paper reading notes in the field of Image-Text Matching/Retrieval.☆13Updated 3 years ago
- A curated list of vision-and-language pre-training (VLP). :-)☆62Updated 3 years ago
- [ICCVW'23] Robust Asymmetric Loss for Multi-Label Long-Tailed Learning☆18Updated 2 years ago
- Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)☆51Updated 2 years ago
- ☆79Updated 2 years ago
- [ACLW'24] LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition☆57Updated last year
- Source code for the paper "A Medical Semantic-Assisted Transformer for Radiographic Report Generation"☆25Updated 2 years ago
- [COLING'25] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding☆44Updated last year
- ☆34Updated 3 years ago
- Implementation of our CVPR2022 paper, Negative-Aware Attention Framework for Image-Text Matching.☆119Updated 2 years ago
- Multi-label Image Recognition with Partial Labels (IJCV'24, ESWA'24, AAAI'22)☆43Updated last year
- Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…☆10Updated last year
- Federated Meta-Learning for Emotion and Sentiment Aware Multi-modal Complaint Identification☆10Updated last year
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆83Updated 7 months ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 3 years ago
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated 2 years ago
- The code of the paper "Cross-Modal Graph Matching Network for Image-Text Retrieval" in ACM Transactions on Multimedia Computing, Communic…☆47Updated 2 years ago