CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
☆203Jan 28, 2024Updated 2 years ago
Alternatives and similar repositories for CapDec
Users that are interested in CapDec are comparing it to the libraries listed below
Sorting:
- ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning☆138Mar 16, 2023Updated 2 years ago
- ☆59Aug 30, 2023Updated 2 years ago
- Simple image captioning model☆1,408Jun 9, 2024Updated last year
- Language Models Can See: Plugging Visual Controls in Text Generation☆259Jun 1, 2022Updated 3 years ago
- Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic☆278Sep 17, 2022Updated 3 years ago
- Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023☆162Sep 9, 2024Updated last year
- PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)☆209Dec 18, 2022Updated 3 years ago
- METER: A Multimodal End-to-end TransformER Framework☆376Nov 16, 2022Updated 3 years ago
- Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".☆18Sep 17, 2021Updated 4 years ago
- [ICLR2023] PLOT: Prompt Learning with Optimal Transport for Vision-Language Models☆175Dec 14, 2023Updated 2 years ago
- Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"☆26Oct 20, 2022Updated 3 years ago
- PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)☆246Jun 10, 2025Updated 8 months ago
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆198May 9, 2023Updated 2 years ago
- Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners☆116Sep 15, 2022Updated 3 years ago
- A simple and effective feature extractor for untrimmed videos☆13Sep 1, 2022Updated 3 years ago
- CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022☆29Dec 1, 2022Updated 3 years ago
- [ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383☆421Oct 28, 2022Updated 3 years ago
- ☆200May 10, 2023Updated 2 years ago
- Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]☆136Sep 29, 2024Updated last year
- Align and Prompt: Video-and-Language Pre-training with Entity Prompts☆188May 1, 2025Updated 9 months ago
- Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning☆169Sep 26, 2022Updated 3 years ago
- implementation of paper https://arxiv.org/abs/2210.04559☆56Nov 26, 2025Updated 3 months ago
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)☆42May 13, 2022Updated 3 years ago
- [ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"☆136May 5, 2023Updated 2 years ago
- ☆194Mar 5, 2025Updated 11 months ago
- ☆47Apr 29, 2024Updated last year
- Cross Modal Retrieval with Querybank Normalisation☆57Nov 21, 2023Updated 2 years ago
- [CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"☆807Mar 20, 2024Updated last year
- Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …☆292Jun 7, 2023Updated 2 years ago
- 🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".☆486Oct 30, 2023Updated 2 years ago
- Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm☆675Sep 19, 2022Updated 3 years ago
- Densely Captioned Images (DCI) dataset repository.☆196Jul 1, 2024Updated last year
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆475Mar 7, 2024Updated last year
- [ECCV 2024] Official PyTorch implementation of "HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts"☆20Nov 22, 2024Updated last year
- Grounded Language-Image Pre-training☆2,572Jan 24, 2024Updated 2 years ago
- [NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"☆183Mar 4, 2024Updated last year
- Official implementation and data release of the paper "Visual Prompting via Image Inpainting".☆318Aug 7, 2023Updated 2 years ago
- Human-like Controllable Image Captioning with Verb-specific Semantic Roles.☆36Mar 11, 2022Updated 3 years ago