gchochla / VAuLT

This repo contains the original implementation of VAuLT, the Vision-and-Augmented-Language Transformer. We provide instructions to download some multimodal social-media datasets, and scripts to experiment with. VAuLT is a stack of Transformers, a LM like BERT that preprocesses the text input of ViLT
16Updated 2 months ago

Related projects

Alternatives and complementary repositories for VAuLT