jmisilo / clip-gpt-captioning
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
☆117Updated 2 months ago
Alternatives and similar repositories for clip-gpt-captioning:
Users that are interested in clip-gpt-captioning are comparing it to the libraries listed below
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆195Updated last year
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆91Updated 4 months ago
- Pytorch implementation of image captioning using transformer-based model.☆66Updated 2 years ago
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆274Updated 2 years ago
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆134Updated last year
- Generate text captions for images from their embeddings.☆106Updated last year
- CLIPScore EMNLP code☆221Updated 2 years ago
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training☆167Updated 2 years ago
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆43Updated last year
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆190Updated last year
- ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning☆132Updated 2 years ago
- Implementation of the paper CPTR : FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING☆30Updated 2 years ago
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆97Updated 2 years ago
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆172Updated 11 months ago
- ☆224Updated 3 years ago
- Finetuning CLIP on a small image/text dataset using huggingface libs☆47Updated 2 years ago
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆76Updated 11 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 8 months ago
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆73Updated last year
- A Simple Image Clustering Script using CLIP and Hierarchial Clustering☆37Updated 2 years ago
- Transformer & CNN Image Captioning model in PyTorch.☆42Updated 2 years ago
- GIT: A Generative Image-to-text Transformer for Vision and Language☆565Updated last year
- Code to train CLIP model☆111Updated 3 years ago
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆34Updated 3 years ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆51Updated last year
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆107Updated last year
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆242Updated 2 years ago
- NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)☆48Updated last year
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆123Updated 5 months ago
- Code implementation of our NeurIPS 2023 paper: Vocabulary-free Image Classification☆107Updated last year