jmisilo / clip-gpt-captioningLinks
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
☆117Updated 4 months ago
Alternatives and similar repositories for clip-gpt-captioning
Users that are interested in clip-gpt-captioning are comparing it to the libraries listed below
Sorting:
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆197Updated last year
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆92Updated 6 months ago
- Pytorch implementation of image captioning using transformer-based model.☆66Updated 2 years ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆50Updated 2 years ago
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆138Updated last year
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆242Updated 2 weeks ago
- Finetuning CLIP on a small image/text dataset using huggingface libs☆47Updated 2 years ago
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆193Updated 2 years ago
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆276Updated 2 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆177Updated last year
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆97Updated 2 years ago
- Generate text captions for images from their embeddings.☆109Updated last year
- Tutorials for FLAVA model https://arxiv.org/abs/2112.04482☆12Updated 3 years ago
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆174Updated last year
- Fine-tuning OpenAI CLIP Model for Image Search on medical images☆76Updated 3 years ago
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆35Updated 3 years ago
- A Simple Image Clustering Script using CLIP and Hierarchial Clustering☆37Updated 2 years ago
- [CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆62Updated 3 months ago
- ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning☆133Updated 2 years ago
- Code for Shifted Diffusion for Text-to-image Generation (CVPR 2023)☆161Updated 2 years ago
- Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power o…☆36Updated 8 months ago
- [CVPR 2023 (Highlight)] FAME-ViL: Multi-Tasking V+L Model for Heterogeneous Fashion Tasks☆53Updated last year
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆79Updated last year
- GIT: A Generative Image-to-text Transformer for Vision and Language☆568Updated last year
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆125Updated 7 months ago
- Fine-tuning code for CLIP models☆228Updated 3 months ago
- Search images with a text or image query, using Open AI's pretrained CLIP model.☆248Updated 3 years ago
- Official Pytorch implementation of "CompoDiff: Versatile Composed Image Retrieval With Latent Diffusion" (TMLR 2024)☆84Updated 4 months ago
- An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.☆129Updated 5 months ago
- Implementation of the paper CPTR : FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING☆30Updated 3 years ago