frank-chris / ImageTextRetrievalLinks
In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on im…
☆11Updated 4 years ago
Alternatives and similar repositories for ImageTextRetrieval
Users that are interested in ImageTextRetrieval are comparing it to the libraries listed below
Sorting:
- ☆22Updated 3 years ago
- Repository of paper Consistency-preserving Visual Question Answering in Medical Imaging (MICCAI2022)☆24Updated 2 years ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆81Updated 4 months ago
- ☆19Updated 4 years ago
- Using image captions with LLM for zero-shot VQA☆17Updated last year
- Multi-label Image Recognition with Partial Labels (IJCV'24, ESWA'24, AAAI'22)☆41Updated last year
- ☆29Updated 2 years ago
- ☆18Updated 3 years ago
- A curated list of vision-and-language pre-training (VLP). :-)☆59Updated 3 years ago
- ☆16Updated 3 years ago
- [ICCVW2023] Robust Asymmetric Loss for Multi-Label Long-Tailed Learning☆17Updated 2 years ago
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆33Updated last year
- [ACL 2021] Learning Relation Alignment for Calibrated Cross-modal Retrieval☆32Updated 2 years ago
- Implementation of our paper, 'Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval.'☆27Updated last year
- ☆40Updated 2 years ago
- [CVPRW 2024] LaPA: Latent Prompt Assist Model For Medical Visual Question Answering☆22Updated 6 months ago
- ☆46Updated 3 years ago
- [ACLW'24] LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition☆54Updated last year
- Official Implementation of "Geometric Multimodal Contrastive Representation Learning" (https://arxiv.org/abs/2202.03390)☆28Updated 10 months ago
- ☆17Updated 4 years ago
- Source code for the paper "A Medical Semantic-Assisted Transformer for Radiographic Report Generation"☆25Updated 2 years ago
- The official implementation of 'Align and Attend: Multimodal Summarization with Dual Contrastive Losses' (CVPR 2023)☆78Updated 2 years ago
- Code release for Grad-CAM Guided Attention Module for Fine-grained Visual Classification (MLSP 2022)☆12Updated 4 years ago
- Paper list of compositional zero-shot learning☆10Updated 3 years ago
- Implementation of the Benchmark Approaches for Medical Instructional Video Classification (MedVidCL) and Medical Video Question Answering…☆30Updated 2 years ago
- ☆28Updated 4 years ago
- The official code for "Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations" (IEEE Access, 2021…☆17Updated 3 years ago
- Paper List about Radiology Report Generation and also some medical image captioning☆11Updated 4 years ago
- [COLING'25] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding☆43Updated 11 months ago
- Official repo for "SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-Supervised Learning", accepted by ICLR 2023.☆21Updated 2 years ago