gchochla / VAuLT

This repo contains the original implementation of VAuLT, the Vision-and-Augmented-Language Transformer. We provide instructions to download some multimodal social-media datasets, and scripts to experiment with. VAuLT is a stack of Transformers, a LM like BERT that preprocesses the text input of ViLT
16Updated 4 months ago

Alternatives and similar repositories for VAuLT:

Users that are interested in VAuLT are comparing it to the libraries listed below