DavidHuji/CapDec

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DavidHuji/CapDec)

DavidHuji / CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

☆209

Alternatives and similar repositories for CapDec

Users that are interested in CapDec are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dhg-wei / DeCap
View on GitHub
ICLR 2023 DeCap: Decoding CLIP Latents for Zero-shot Captioning
☆144Mar 16, 2023Updated 3 years ago
allenai / close
View on GitHub
☆59Aug 30, 2023Updated 2 years ago
rmokady / CLIP_prefix_caption
View on GitHub
Simple image captioning model
☆1,421Jun 9, 2024Updated 2 years ago
FeiElysia / ViECap
View on GitHub
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023
☆167Sep 9, 2024Updated last year
yxuansu / MAGIC
View on GitHub
Language Models Can See: Plugging Visual Controls in Text Generation
☆261Jun 1, 2022Updated 4 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
YoadTew / zero-shot-image-to-text
View on GitHub
Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
☆279Sep 17, 2022Updated 3 years ago
aimagelab / camel
View on GitHub
CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022
☆30Dec 1, 2022Updated 3 years ago
j-min / CLIP-Caption-Reward
View on GitHub
PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)
☆246Jun 10, 2025Updated last year
ylsung / VL_adapter
View on GitHub
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆212Dec 18, 2022Updated 3 years ago
junyangwang0410 / Knight
View on GitHub
SotA text-only image/video method (IJCAI 2023)
☆15Jan 9, 2024Updated 2 years ago
yonatanbitton / data_efficient_masked_language_modeling_for_vision_and_language
View on GitHub
Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".
☆18Sep 17, 2021Updated 4 years ago
yangjianxin1 / ClipCap-Chinese
View on GitHub
基于ClipCap的看图说话Image Caption模型
☆325Apr 1, 2022Updated 4 years ago
zjr2000 / Untrimmed-Video-Feature-Extractor
View on GitHub
A simple and effective feature extractor for untrimmed videos
☆13Sep 1, 2022Updated 3 years ago
zdou0830 / METER
View on GitHub
METER: A Multimodal End-to-end TransformER Framework
☆377Nov 16, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
CHENGY12 / PLOT
View on GitHub
[ICLR2023] PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
☆177Dec 14, 2023Updated 2 years ago
MikeWangWZHL / VidIL
View on GitHub
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆117Sep 15, 2022Updated 3 years ago
XuMengyaAmy / ReportDALS
View on GitHub
☆16Nov 19, 2020Updated 5 years ago
sarahpratt / CuPL
View on GitHub
☆203May 10, 2023Updated 3 years ago
boreng0817 / IFCap
View on GitHub
[EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
☆15May 13, 2025Updated last year
xu-shitong / diffusion-image-captioning
View on GitHub
implementation of paper https://arxiv.org/abs/2210.04559
☆56Nov 26, 2025Updated 7 months ago
orensul / analogies_mining
View on GitHub
☆21Mar 19, 2024Updated 2 years ago
naver-ai / hype
View on GitHub
[ECCV 2024] Official PyTorch implementation of "HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts"
☆20Nov 22, 2024Updated last year
Weixin-Liang / Modality-Gap
View on GitHub
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
☆176Sep 26, 2022Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
google-research / composed_image_retrieval
View on GitHub
☆197May 9, 2026Updated 2 months ago
om-ai-lab / VL-CheckList
View on GitHub
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]
☆138Apr 10, 2026Updated 3 months ago
quangvnai / grit
View on GitHub
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
☆199May 9, 2023Updated 3 years ago
ShiYaya / emscore
View on GitHub
Research code for CVPR 2022 paper: "EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching"
☆26Oct 20, 2022Updated 3 years ago
clip-vil / CLIP-ViL
View on GitHub
[ICLR 2022] code for "How Much Can CLIP Benefit Vision-and-Language Tasks?" https://arxiv.org/abs/2107.06383
☆419Oct 28, 2022Updated 3 years ago
salesforce / ALPRO
View on GitHub
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
☆188May 1, 2025Updated last year
k1rezaei / Text-to-concept
View on GitHub
☆36Feb 5, 2024Updated 2 years ago
gmftbyGMFTBY / MomentumDecoding
View on GitHub
Momentum Decoding: Open-ended Text Generation as Graph Exploration
☆19Jan 27, 2023Updated 3 years ago
naver-ai / muco
View on GitHub
Official Pytorch implementation of MuCo: Multi-turn Contrastive Learning for Multimodal Embedding Model (CVPR 2026)
☆15Apr 16, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
woojeongjin / FewVLM
View on GitHub
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)
☆42May 13, 2022Updated 4 years ago
vishaal27 / SuS-X
View on GitHub
Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]
☆104Aug 22, 2023Updated 2 years ago
ioanacroi / qb-norm
View on GitHub
Cross Modal Retrieval with Querybank Normalisation
☆57Nov 21, 2023Updated 2 years ago
mertyg / vision-language-models-are-bows
View on GitHub
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …
☆294Jun 7, 2023Updated 3 years ago
microsoft / RegionCLIP
View on GitHub
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
☆817Mar 20, 2024Updated 2 years ago
RitaRamo / smallcap
View on GitHub
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
☆125Feb 13, 2024Updated 2 years ago
dd-dreams / aft
View on GitHub
aft - advanced file transfer.
☆47Jun 26, 2026Updated 3 weeks ago