jmisilo / clip-gpt-captioning
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
☆115Updated last month
Alternatives and similar repositories for clip-gpt-captioning:
Users that are interested in clip-gpt-captioning are comparing it to the libraries listed below
- Pytorch implementation of image captioning using transformer-based model.☆65Updated last year
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆88Updated 3 months ago
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆192Updated last year
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆242Updated 2 years ago
- GIT: A Generative Image-to-text Transformer for Vision and Language☆561Updated last year
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆189Updated last year
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆95Updated 2 years ago
- Using LSTM or Transformer to solve Image Captioning in Pytorch☆76Updated 3 years ago
- Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.☆42Updated last year
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆43Updated 11 months ago
- Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. (CVPR 2023)☆60Updated 3 weeks ago
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training☆166Updated last year
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆273Updated 2 years ago
- An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.☆122Updated 2 months ago
- Transformer & CNN Image Captioning model in PyTorch.☆42Updated 2 years ago
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆101Updated last year
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆73Updated last year
- Fine-tuning OpenAI CLIP Model for Image Search on medical images☆76Updated 2 years ago
- Implementation of the paper CPTR : FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING☆30Updated 2 years ago
- Generate text captions for images from their embeddings.☆105Updated last year
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 7 months ago
- ☆76Updated 2 years ago
- ☆13Updated last year
- Hyperparameter analysis for Image Captioning using LSTMs and Transformers☆26Updated last year
- Simple and Easy to use Image Captioning Implementation☆9Updated 3 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆174Updated last year
- ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning☆130Updated 2 years ago
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆34Updated 3 years ago
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆134Updated last year
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago