forbes110 / PLEDGE--Paragraph-LEvel-image-Description-GEneration

Apply an end-to-end model structure (ViT + GPT) to describe images in more detail, rather than traditional image captioning that only provides object detections or a few simple sentences.
11Updated this week

Alternatives and similar repositories for PLEDGE--Paragraph-LEvel-image-Description-GEneration:

Users that are interested in PLEDGE--Paragraph-LEvel-image-Description-GEneration are comparing it to the libraries listed below