microsoft / M3PLinks

Multitask Multilingual Multimodal Pre-training

☆72

Alternatives and similar repositories for M3P

Users that are interested in M3P are comparing it to the libraries listed below

Sorting:

zmykevin / UC2
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
☆34Updated 4 years ago
salesforce / VD-BERT
☆44Updated 5 months ago
ShannonAI / OpenViDial
Code, Models and Datasets for OpenViDial Dataset
☆132Updated 3 years ago
cooelf / UVR-NMT
Neural Machine Translation with universal Visual Representation (ICLR 2020)
☆89Updated 5 years ago
zhegan27 / VILLA
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER…
☆119Updated 4 years ago
zhegan27 / LXMERT-AdvTrain
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT…
☆21Updated 5 years ago
berniebear / Multi-HT100M
☆53Updated 3 years ago
vmurahari3 / visdial-bert
Implementation for "Large-scale Pretraining for Visual Dialog" https://arxiv.org/abs/1912.02379
☆97Updated 5 years ago
e-bug / volta
[TACL 2021] Code and data for the framework in "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-La…
☆114Updated 3 years ago
zinengtang / VidLanKD
Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))
☆56Updated 2 years ago
zengyan-97 / CCLM
Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training (ACL 2023))
☆92Updated 2 years ago
UKPLab / MMT-Retrieval
☆131Updated 2 years ago
eric-xw / Video-guided-Machine-Translation
Starter code for the VMT task and challenge
☆51Updated 5 years ago
e-bug / iglue
[ICML 2022] Code and data for our paper "IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages"
☆49Updated 2 years ago
ictnlp / DSTC8-AVSD
We rank the 1st in DSTC8 Audio-Visual Scene-Aware Dialog competition. This is the source code for our IEEE/ACM TASLP (AAAI2020-DSTC8-AVSD…
☆56Updated 2 years ago
jayleicn / VideoLanguageFuturePred
[EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction
☆51Updated 3 years ago
intersun / LightningDOT
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT
☆72Updated 3 years ago
limanling / clip-event
☆107Updated 3 years ago
jamespark3922 / visual-comet
VisualCOMET: Reasoning about the Dynamic Context of a Still Image
☆88Updated 2 years ago
lichengunc / pretrain-vl-data
Pre-trained V+L Data Preparation
☆46Updated 5 years ago
woojeongjin / FewVLM
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)
☆43Updated 3 years ago
e-bug / cross-modal-ablation
[EMNLP 2021] Code and data for our paper "Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers…
☆20Updated 3 years ago
necla-ml / SNLI-VE
Dataset and starting code for visual entailment dataset
☆118Updated 3 years ago
VALUE-Leaderboard / StarterCode
Starter Code for VALUE benchmark
☆80Updated 3 years ago
Weili-NLP / UNIMO
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
☆70Updated 4 years ago
MichaelZhouwang / VLUE
This repo contains codes and instructions for baselines in the VLUE benchmark.
☆41Updated 3 years ago
airsplay / vokenization
PyTorch code for EMNLP 2020 Paper "Vokenization: Improving Language Understanding with Visual Supervision"
☆192Updated 4 years ago
wenhuchen / Meta-Module-Network
Code for WACV 2021 Paper "Meta Module Network for Compositional Visual Reasoning"
☆43Updated 4 years ago
MILVLG / rosita
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
☆56Updated 2 years ago
nttmdlab-nlp / VisualMRC
VisualMRC: Machine Reading Comprehension on Document Images (AAAI2021)
☆56Updated 8 months ago