jmisilo / clip-gpt-captioning
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
☆114Updated last year
Alternatives and similar repositories for clip-gpt-captioning:
Users that are interested in clip-gpt-captioning are comparing it to the libraries listed below
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆86Updated 3 weeks ago
- Pytorch implementation of image captioning using transformer-based model.☆62Updated last year
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆188Updated 11 months ago
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆240Updated 2 years ago
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆94Updated last year
- Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power o…☆27Updated 3 months ago
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆272Updated 2 years ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆50Updated last year
- Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.☆34Updated last year
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆171Updated last year
- Generate text captions for images from their embeddings.☆102Updated last year
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆185Updated last year
- Image Captioning using CNN and Transformer.☆50Updated 3 years ago
- [ICCVW 2023] - Mapping Memes to Words for Multimodal Hateful Meme Classification☆24Updated 11 months ago
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆37Updated 9 months ago
- ☆212Updated 2 years ago
- Evaluation tools for image captioning. Including BLEU, ROUGE-L, CIDEr, METEOR, SPICE scores.☆25Updated last year
- GIT: A Generative Image-to-text Transformer for Vision and Language☆555Updated last year
- ☆75Updated 2 years ago
- Implementation of the paper CPTR : FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING☆28Updated 2 years ago
- [ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion☆166Updated 8 months ago
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆133Updated 11 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆138Updated 7 months ago
- ☆89Updated last year
- ☆11Updated last year
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆35Updated 2 years ago
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆74Updated last year
- Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. CVPR 2023☆58Updated 2 months ago
- Using LSTM or Transformer to solve Image Captioning in Pytorch☆76Updated 3 years ago
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 4 months ago