Muennighoff / vilioLinks

🥶Vilio: State-of-the-art VL models in PyTorch & PaddlePaddle

☆90

Alternatives and similar repositories for vilio

Users that are interested in vilio are comparing it to the libraries listed below

Sorting:

HimariO / HatefulMemesChallenge
☆93Updated 2 years ago
UKPLab / MMT-Retrieval
☆131Updated 2 years ago
airsplay / vokenization
PyTorch code for EMNLP 2020 Paper "Vokenization: Improving Language Understanding with Visual Supervision"
☆192Updated 4 years ago
yikuan8 / Transformers-VQA
An implementation that downstreams pre-trained V+L models to VQA tasks. Now support: VisualBERT, LXMERT, and UNITER
☆165Updated 2 years ago
e-bug / volta
[TACL 2021] Code and data for the framework in "Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-La…
☆114Updated 3 years ago
Nithin-Holla / meme_challenge
Repository containing code from team Kingsterdam for the Hateful Memes Challenge
☆22Updated 3 years ago
salesforce / VD-BERT
☆44Updated 5 months ago
zhegan27 / VILLA
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER…
☆119Updated 4 years ago
alasdairtran / transform-and-tell
[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning
☆92Updated last year
drivendataorg / hateful-memes
☆66Updated 2 years ago
multimodal / multimodal
A collection of multimodal datasets, and visual features for VQA and captionning in pytorch. Just run "pip install multimodal"
☆83Updated 3 years ago
allenai / visual-reasoning-rationalization
Code associated with the "Natural Language Rationales with Full-Stack Visual Reasoning" EMNLP Findings 2020 paper
☆24Updated 4 years ago
rowanz / merlot
MERLOT: Multimodal Neural Script Knowledge Models
☆225Updated 3 years ago
berniebear / Multi-HT100M
☆53Updated 3 years ago
ajamjoom / Image-Captions
BERT + Image Captioning
☆134Updated 4 years ago
Cloud-CV / vilbert-multi-task
12-in-1: Multi-Task Vision and Language Representation Learning Web Demo
☆35Updated 2 years ago
microsoft / M3P
Multitask Multilingual Multimodal Pre-training
☆71Updated 3 years ago
zhegan27 / LXMERT-AdvTrain
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT…
☆21Updated 5 years ago
ChenRocks / BUTD-UNITER-NLVR2
Support extracting BUTD features for NLVR2 images.
☆18Updated 5 years ago
j-min / VL-T5
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
☆374Updated 2 years ago
vmurahari3 / visdial-bert
Implementation for "Large-scale Pretraining for Visual Dialog" https://arxiv.org/abs/1912.02379
☆97Updated 5 years ago
google-research-datasets / Image-Caption-Quality-Dataset
A dataset of crowdsourced ratings for machine-generated image captions
☆37Updated 6 years ago
necla-ml / SNLI-VE
Dataset and starting code for visual entailment dataset
☆118Updated 3 years ago
facebookresearch / mmbt
Supervised Multimodal Bitransformers for Classifying Images and Text
☆256Updated 4 years ago
intersun / LightningDOT
source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT
☆72Updated 3 years ago
uclanlp / visualbert
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"
☆539Updated 2 years ago
Dong-JinKim / DenseRelationalCaptioning
Code of Dense Relational Captioning
☆69Updated 2 years ago
maximek3 / e-ViL
☆40Updated 3 years ago
cooelf / UVR-NMT
Neural Machine Translation with universal Visual Representation (ICLR 2020)
☆89Updated 5 years ago
ExplainableML / CLEVR-X
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
☆29Updated 2 years ago