jmisilo / clip-gpt-captioning
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
☆115Updated this week
Alternatives and similar repositories for clip-gpt-captioning:
Users that are interested in clip-gpt-captioning are comparing it to the libraries listed below
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆190Updated last year
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆86Updated last month
- An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.☆119Updated last month
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆39Updated 10 months ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆50Updated last year
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆94Updated last year
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆133Updated last year
- Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.☆38Updated last year
- [ICCVW 2023] - Mapping Memes to Words for Multimodal Hateful Meme Classification☆24Updated last year
- Finetuning CLIP on a small image/text dataset using huggingface libs☆44Updated 2 years ago
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆241Updated 2 years ago
- Pytorch implementation of image captioning using transformer-based model.☆62Updated last year
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆188Updated last year
- Generate text captions for images from their embeddings.☆102Updated last year
- Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023☆59Updated 3 months ago
- A Simple Image Clustering Script using CLIP and Hierarchial Clustering☆34Updated last year
- NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)☆46Updated last year
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆272Updated 2 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆82Updated this week
- Visual Question Answering in PyTorch with various Attention Models☆20Updated 4 years ago
- Clipora is a powerful toolkit for fine-tuning OpenCLIP models using Low Rank Adapters (LoRA).☆19Updated 6 months ago
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆102Updated last year
- Dataset for the investigation of visual semiotics, and how specific visual features and design choices can elicit specific emotions, thou…☆10Updated last year
- ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning☆128Updated last year
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆34Updated 3 years ago
- Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4☆27Updated last year
- [CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".☆264Updated 10 months ago
- Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power o…☆28Updated 4 months ago
- Retrieval augmented diffusion from CompVis.☆52Updated 2 years ago