inuwamobarak / Image-captioning-ViTView on GitHub
Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power of Transformers and computer vision. It leverages state-of-the-art pre-trained ViT models and employs technique
40Oct 14, 2024Updated last year

Alternatives and similar repositories for Image-captioning-ViT

Users that are interested in Image-captioning-ViT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?