UKPLab / MMT-Retrieval
☆129Updated last year
Related projects: ⓘ
- source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT☆72Updated last year
- Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER…☆119Updated 3 years ago
- [TACL 2021] Code and data for the framework in "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-La…☆113Updated 2 years ago
- [CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning☆89Updated 5 months ago
- Starter Code for VALUE benchmark☆79Updated 2 years ago
- Multitask Multilingual Multimodal Pre-training☆68Updated last year
- Official repository of ICCV 2021 - Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models☆98Updated last month
- Reliably download millions of images efficiently☆110Updated 3 years ago
- Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO☆49Updated 4 years ago
- ☆187Updated 4 months ago
- Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.☆62Updated 3 years ago
- Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps[AAAI2021]☆56Updated 2 years ago
- Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))☆85Updated last year
- [ACL 2020] PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning☆166Updated 3 years ago
- [ICML 2022] Code and data for our paper "IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages"☆49Updated last year
- [CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning☆206Updated last year
- Official code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval☆55Updated 2 years ago
- Align and Prompt: Video-and-Language Pre-training with Entity Prompts☆185Updated 2 years ago
- A modular framework for Visual Question Answering research by the FAIR A-STAR team☆45Updated 3 years ago
- Data and code for CVPR 2020 paper: "VIOLIN: A Large-Scale Dataset for Video-and-Language Inference"☆158Updated 4 years ago
- Use CLIP to represent video for Retrieval Task☆67Updated 3 years ago
- Code and Resources for the Transformer Encoder Reasoning Network (TERN) - https://arxiv.org/abs/2004.09144☆57Updated 9 months ago
- Dataset and starting code for visual entailment dataset☆107Updated 2 years ago
- [ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources☆42Updated last year
- A paper list of visual semantic embeddings and text-image retrieval.☆41Updated 3 years ago
- Code of Dense Relational Captioning☆67Updated last year
- Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020☆81Updated 4 years ago
- PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)☆357Updated last year
- PyTorch code for EMNLP 2020 Paper "Vokenization: Improving Language Understanding with Visual Supervision"☆186Updated 3 years ago
- Pre-trained V+L Data Preparation☆45Updated 4 years ago