jmisilo / clip-gpt-captioningLinks

CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.

☆117

Alternatives and similar repositories for clip-gpt-captioning

Users that are interested in clip-gpt-captioning are comparing it to the libraries listed below

Sorting:

jchenghu / ExpansionNet_v2
Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
☆92Updated 6 months ago
shashnkvats / Indofashionclip
Fine tuning OpenAI's CLIP model on Indian Fashion Dataset
☆50Updated 2 years ago
fkodom / clip-text-decoder
Generate text captions for images from their embeddings.
☆110Updated last year
zarzouram / image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
☆66Updated 2 years ago
DavidHuji / CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
☆197Updated last year
YoadTew / zero-shot-image-to-text
Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
☆278Updated 2 years ago
j-min / CLIP-Caption-Reward
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
☆243Updated last month
LexTypeC / smlr
A Simple Image Clustering Script using CLIP and Hierarchial Clustering
☆38Updated 2 years ago
TheoCoombes / ClipCap
Using pretrained encoder and language models to generate captions from multimedia inputs.
☆97Updated 2 years ago
dhansmair / flamingo-mini
Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training
☆167Updated 2 years ago
elsevierlabs-os / clip-image-search
Fine-tuning OpenAI CLIP Model for Image Search on medical images
☆76Updated 3 years ago
kyegomez / PALI
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
☆91Updated last year
gregor-ge / mBLIP
☆86Updated last year
hananshafi / llmblueprint
[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"
☆80Updated last year
ExponentialML / Video-BLIP2-Preprocessor
A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it
☆138Updated last year
LAION-AI / General-GPT
☆64Updated last year
Vision-CAIR / VisualGPT
VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models
☆334Updated 2 years ago
ahmedssabir / Textual-Visual-Semantic-Dataset
Visual Semantic Relatedness Dataset for Captioning. CVPRW 2023
☆10Updated last year
tezansahu / VQA-With-Multimodal-Transformers
Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)
☆36Updated 3 years ago
fabawi / ImageBind-LoRA
Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRA
☆185Updated last year
davidnvq / grit
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
☆194Updated 2 years ago
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆84Updated 5 months ago
aimagelab / pacscore
[CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆62Updated 4 months ago
ABaldrati / CLIP4Cir
[ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features
☆180Updated last year
mshukor / UnIVAL
[TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.
☆228Updated last year
afiaka87 / retrieval-augmented-diffusion
Retrieval augmented diffusion from CompVis.
☆53Updated 2 years ago
sehyunkwon / ICTC
This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)
☆90Updated last year
jylins / videoxum
[TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos
☆45Updated last year
LAION-AI / phenaki
A phenaki reproduction using pytorch.
☆220Updated last year
joanrod / ocr-vqgan
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…
☆81Updated 2 years ago