jmisilo / clip-gpt-captioning
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
☆108Updated 9 months ago
Related projects: ⓘ
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆83Updated 4 months ago
- Pytorch implementation of image captioning using transformer-based model.☆57Updated last year
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆181Updated 7 months ago
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆233Updated 2 years ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆45Updated last year
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆94Updated last year
- VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models☆316Updated last year
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆177Updated last year
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆130Updated 7 months ago
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆33Updated 2 years ago
- An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.☆104Updated 2 years ago
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆261Updated 2 years ago
- Implementation of the paper CPTR : FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING☆26Updated 2 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆156Updated last year
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆77Updated last week
- Generate text captions for images from their embeddings.☆97Updated last year
- Using LSTM or Transformer to solve Image Captioning in Pytorch☆73Updated 3 years ago
- ☆200Updated 2 years ago
- ☆73Updated last year
- CLIPScore EMNLP code☆185Updated last year
- A multimodal inference pipeline that integrates InstructBLIP with textgen-webui for Vicuna and related models.☆30Updated last year
- An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"☆127Updated 5 months ago
- EILEV: Efficient In-Context Learning in Vision-Language Models for Egocentric Videos☆108Updated 3 months ago
- Fine-tuning OpenAI CLIP Model for Image Search on medical images☆73Updated 2 years ago
- ☆84Updated 8 months ago
- Image Captioning Using Transformer☆255Updated 2 years ago
- Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.☆22Updated 11 months ago
- Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.☆206Updated 3 years ago
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆143Updated 4 months ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆275Updated 2 months ago