[TACL 2021] Code and data for the framework in "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs"
☆114Mar 24, 2022Updated 3 years ago
Alternatives and similar repositories for volta
Users that are interested in volta are comparing it to the libraries listed below
Sorting:
- [ICML 2022] Code and data for our paper "IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages"☆49Dec 7, 2022Updated 3 years ago
- [EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers…☆20Jan 17, 2022Updated 4 years ago
- [EMNLP 2021] Code and data for our paper "Visually Grounded Reasoning across Languages and Cultures"☆30Dec 30, 2021Updated 4 years ago
- CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training☆34Nov 9, 2021Updated 4 years ago
- Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER…☆119Jan 13, 2021Updated 5 years ago
- Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"☆801Jun 30, 2021Updated 4 years ago
- Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT…☆21Oct 20, 2020Updated 5 years ago
- Dataset for Bilingual VLN☆11Dec 5, 2020Updated 5 years ago
- Multi Task Vision and Language☆825Feb 16, 2022Updated 4 years ago
- Grid features pre-training code for visual question answering☆269Sep 17, 2021Updated 4 years ago
- Recent Advances in Vision and Language PreTrained Models (VL-PTMs)☆1,155Aug 19, 2022Updated 3 years ago
- [CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning☆208Sep 30, 2022Updated 3 years ago
- Oscar and VinVL☆1,052Aug 28, 2023Updated 2 years ago
- PyTorch bottom-up attention with Detectron2☆239Jan 4, 2022Updated 4 years ago
- PyTorch code for EMNLP 2020 Paper "Vokenization: Improving Language Understanding with Visual Supervision"☆192Mar 8, 2021Updated 4 years ago
- Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".☆746May 22, 2023Updated 2 years ago
- ☆18Oct 3, 2023Updated 2 years ago
- A unified framework to jointly model images, text, and human attention traces.☆79May 24, 2021Updated 4 years ago
- Vision-Language Pre-training for Image Captioning and Question Answering☆423Jan 18, 2022Updated 4 years ago
- Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"☆540May 1, 2023Updated 2 years ago
- PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".☆966Oct 22, 2022Updated 3 years ago
- ☆10Jul 23, 2021Updated 4 years ago
- ☆11Sep 7, 2020Updated 5 years ago
- Project page for "Visual Grounding in Video for Unsupervised Word Translation" CVPR 2020☆43Apr 26, 2020Updated 5 years ago
- A paper list of visual semantic embeddings and text-image retrieval.☆42Dec 4, 2020Updated 5 years ago
- Code for ViLBERTScore in EMNLP Eval4NLP☆18Oct 27, 2022Updated 3 years ago
- ☆478Nov 21, 2022Updated 3 years ago
- Cross-modal Coherence Modeling for Caption Generation☆11Jul 24, 2020Updated 5 years ago
- PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision☆46Jul 29, 2020Updated 5 years ago
- Unpaired Image Captioning☆36Mar 25, 2021Updated 4 years ago
- Reliably download millions of images efficiently☆118Mar 8, 2021Updated 4 years ago
- PyTorch code for EMNLP 2020 paper "X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers"☆50Aug 27, 2021Updated 4 years ago
- ☆21Jul 25, 2024Updated last year
- Code and data for ImageCoDe, a contextual vison-and-language benchmark☆41Mar 1, 2024Updated 2 years ago
- Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"☆26Oct 20, 2022Updated 3 years ago
- [ICLR 2021] "InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective" by Boxin Wang, Shuohang Wang, Y…☆85Oct 25, 2023Updated 2 years ago
- Implementation for MAF: Multimodal Alignment Framework☆46Nov 25, 2020Updated 5 years ago
- PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)☆27Oct 13, 2022Updated 3 years ago
- Pytorch version of DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization (NAACL 2021)☆17Jan 12, 2023Updated 3 years ago