jmisilo / clip-gpt-captioning
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
☆115Updated last year
Alternatives and similar repositories for clip-gpt-captioning:
Users that are interested in clip-gpt-captioning are comparing it to the libraries listed below
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆190Updated last year
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆241Updated 2 years ago
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆94Updated last year
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆50Updated last year
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆86Updated last month
- Pytorch implementation of image captioning using transformer-based model.☆62Updated last year
- Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power o…☆28Updated 4 months ago
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆187Updated last year
- Finetuning CLIP on a small image/text dataset using huggingface libs☆44Updated 2 years ago
- Implementation of the paper CPTR : FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING☆28Updated 2 years ago
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆272Updated 2 years ago
- Generate text captions for images from their embeddings.☆102Updated last year
- Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023☆59Updated 3 months ago
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆39Updated 10 months ago
- An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.☆119Updated last month
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆102Updated last year
- Simple and Easy to use Image Captioning Implementation☆9Updated 3 years ago
- CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022☆29Updated 2 years ago
- [ICCVW 2023] - Mapping Memes to Words for Multimodal Hateful Meme Classification☆24Updated last year
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆34Updated 3 years ago
- Transformer & CNN Image Captioning model in PyTorch.☆42Updated last year
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 6 months ago
- VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models☆326Updated last year
- GIT: A Generative Image-to-text Transformer for Vision and Language☆557Updated last year
- ☆89Updated last year
- Simple image captioning model☆1,338Updated 8 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated 6 months ago
- Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …☆60Updated 2 years ago
- NExT-GPT: Any-to-Any Multimodal Large Language Model☆19Updated 3 months ago
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆74Updated last year