forbes110 / PLEDGE--Paragraph-LEvel-image-Description-GEneration

Apply an end-to-end model structure (ViT + GPT) to describe images in more detail, rather than traditional image captioning that only provides object detections or a few simple sentences.
11Updated 3 weeks ago

Related projects

Alternatives and complementary repositories for PLEDGE--Paragraph-LEvel-image-Description-GEneration