jmisilo / clip-gpt-captioningLinks
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
☆117Updated 5 months ago
Alternatives and similar repositories for clip-gpt-captioning
Users that are interested in clip-gpt-captioning are comparing it to the libraries listed below
Sorting:
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆92Updated 6 months ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆50Updated 2 years ago
- Generate text captions for images from their embeddings.☆110Updated last year
- Pytorch implementation of image captioning using transformer-based model.☆66Updated 2 years ago
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆197Updated last year
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆278Updated 2 years ago
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆243Updated last month
- A Simple Image Clustering Script using CLIP and Hierarchial Clustering☆38Updated 2 years ago
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆97Updated 2 years ago
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training☆167Updated 2 years ago
- Fine-tuning OpenAI CLIP Model for Image Search on medical images☆76Updated 3 years ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆91Updated last year
- ☆86Updated last year
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"☆80Updated last year
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆138Updated last year
- ☆64Updated last year
- VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models☆334Updated 2 years ago
- Visual Semantic Relatedness Dataset for Captioning. CVPRW 2023☆10Updated last year
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆36Updated 3 years ago
- Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRA☆185Updated last year
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆194Updated 2 years ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 5 months ago
- [CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆62Updated 4 months ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆180Updated last year
- [TMLR23] Official implementation of UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.☆228Updated last year
- Retrieval augmented diffusion from CompVis.☆53Updated 2 years ago
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆90Updated last year
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆45Updated last year
- A phenaki reproduction using pytorch.☆220Updated last year
- OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Percept…☆81Updated 2 years ago