jmisilo / clip-gpt-captioningLinks
CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.
☆118Updated 9 months ago
Alternatives and similar repositories for clip-gpt-captioning
Users that are interested in clip-gpt-captioning are comparing it to the libraries listed below
Sorting:
- CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)☆202Updated last year
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆94Updated 11 months ago
- Generate text captions for images from their embeddings.☆116Updated 2 years ago
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆246Updated 6 months ago
- Using pretrained encoder and language models to generate captions from multimedia inputs.☆98Updated 2 years ago
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆280Updated 3 years ago
- Pytorch implementation of image captioning using transformer-based model.☆68Updated 2 years ago
- Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"☆75Updated 2 years ago
- [TMM 2023] VideoXum: Cross-modal Visual and Textural Summarization of Videos☆53Updated last year
- Implementation of the deepmind Flamingo vision-language model, based on Hugging Face language models and ready for training☆168Updated 2 years ago
- Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRA☆193Updated 2 years ago
- This is the official repository for the LENS (Large Language Models Enhanced to See) system.☆357Updated 4 months ago
- An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.☆134Updated 11 months ago
- Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"☆92Updated last year
- VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models☆338Updated 2 years ago
- Fine-tuning OpenAI CLIP Model for Image Search on medical images☆77Updated 3 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 3 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆188Updated 2 years ago
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 4 months ago
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆197Updated 2 years ago
- ☆48Updated 4 years ago
- Fine tuning OpenAI's CLIP model on Indian Fashion Dataset☆52Updated 2 years ago
- PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)☆124Updated 2 years ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆61Updated 2 years ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆36Updated last year
- A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it☆142Updated last year
- A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or v…☆39Updated last year
- CLIPScore EMNLP code☆241Updated 2 years ago
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models☆260Updated 4 months ago
- Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"☆269Updated last year