shreydan / VisionGPT2

Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.
30Updated last year

Related projects

Alternatives and complementary repositories for VisionGPT2