shreydan / VisionGPT2

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
34Updated last year

Alternatives and similar repositories for VisionGPT2:

Users that are interested in VisionGPT2 are comparing it to the libraries listed below