jmisilo / clip-gpt-captioningLinks
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
☆117Updated 6 months ago
Alternatives and similar repositories for clip-gpt-captioning
Users that are interested in clip-gpt-captioning are comparing it to the libraries listed below
Sorting:
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆198Updated last year
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆245Updated 2 months ago
- Generate text captions for images from their embeddings.☆115Updated 2 years ago
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆93Updated 8 months ago
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆278Updated 2 years ago
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training☆168Updated 2 years ago
- A Simple Image Clustering Script using CLIP and Hierarchial Clustering☆38Updated 2 years ago
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆98Updated 2 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆181Updated last year
- VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models☆337Updated 2 years ago
- [ICCVW 2023] - Mapping Memes to Words for Multimodal Hateful Meme Classification☆25Updated 4 months ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆51Updated 2 years ago
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆139Updated last year
- GIT: A Generative Image-to-text Transformer for Vision and Language☆572Updated last year
- Pytorch implementation of image captioning using transformer-based model.☆67Updated 2 years ago
- Clipora is a powerful toolkit for fine-tuning OpenCLIP models using Low Rank Adapters (LoRA).☆24Updated last year
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆83Updated 3 weeks ago
- An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.☆132Updated 8 months ago
- Finetuning CLIP on a small image/text dataset using huggingface libs☆50Updated 2 years ago
- EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties☆130Updated 9 months ago
- [CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆64Updated last month
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆81Updated last year
- Code to train CLIP model☆119Updated 3 years ago
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆47Updated last year
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"☆269Updated last year
- Retrieval augmented diffusion from CompVis.☆53Updated 3 years ago
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆36Updated 3 years ago
- Source code for paper: "AltDiffusion: A multilingual Text-to-Image diffusion model"☆41Updated last year
- CLIPScore EMNLP code☆236Updated 2 years ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆79Updated 2 months ago