FeiElysia/awesome-zero-shot-captioning

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FeiElysia/awesome-zero-shot-captioning)

FeiElysia / awesome-zero-shot-captioning

A curated list of zero-shot captioning papers

☆24

Alternatives and similar repositories for awesome-zero-shot-captioning

Users that are interested in awesome-zero-shot-captioning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

adobe-research / llava-score
View on GitHub
☆11Oct 2, 2024Updated last year
FeiElysia / ViECap
View on GitHub
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023
☆167Sep 9, 2024Updated last year
arijitray1993 / COLA
View on GitHub
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆25May 14, 2026Updated 2 months ago
aimagelab / PMA-Net
View on GitHub
[ICCV 2023] With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning.
☆19Jun 7, 2024Updated 2 years ago
XLiu443 / Tem-adapter
View on GitHub
[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
☆37Oct 18, 2023Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
wzz618 / wozaixiaoyuan
View on GitHub
我在校园的各项API，自动运行脚本，支持多人
☆12Jun 28, 2022Updated 4 years ago
visinf / fldr-vfi
View on GitHub
Efficient Feature Extraction for High-resolution Video Frame Interpolation (BMVC 2022)
☆14Aug 24, 2023Updated 2 years ago
hfutmars / MGCL
View on GitHub
The complete codes of the paper "Multimodal Graph Contrastive Learning for Recommendation"
☆10Mar 20, 2023Updated 3 years ago
ytaek-oh / awesome-vl-compositionality
View on GitHub
Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.
☆40Feb 13, 2025Updated last year
Qrange-group / Mirror-Gradient
View on GitHub
WWW'24, Mirror Gradient (MG) makes multimodal recommendation models approach flat local minima easier compared to models with normal trai…
☆17Nov 1, 2024Updated last year
fqldom / BeFA
View on GitHub
BeFA: A General Behavior-driven Feature Adapter for Multimedia Recommendation
☆13Feb 21, 2025Updated last year
weimingboya / DFT
View on GitHub
☆13Jun 2, 2023Updated 3 years ago
ChenyuHeidiZhang / VL-commonsense
View on GitHub
☆14May 23, 2022Updated 4 years ago
Neon-Jing / Guider
View on GitHub
[WSDM 2025] Source code for "Teach Me How to Denoise: A Universal Framework for Denoising Multi-modal Recommender Systems via Guided Cali…
☆14Oct 14, 2025Updated 9 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
NanGongNingYi / GUME
View on GitHub
☆16Jul 19, 2024Updated 2 years ago
TerminologyHub / termhub-in-5-minutes
View on GitHub
Developer project for getting basic API integrations working in under 5 minutes
☆11May 22, 2026Updated 2 months ago
RyanLiut / awesome-diverse-captioning
View on GitHub
Some papers about *diverse* image (a few videos) captioning
☆25Apr 4, 2023Updated 3 years ago
xyfJASON / mathematical-modeling-python
View on GitHub
Python codes for mathematical modeling.
☆13Sep 5, 2021Updated 4 years ago
Muhammad-Ullah / tflite_image_classification
View on GitHub
Flutter repository based on tflite model for image recognition
☆30Apr 1, 2022Updated 4 years ago
aburns4 / textualforesight
View on GitHub
☆12Aug 8, 2024Updated last year
XuRui314 / GLM4v-Finetune
View on GitHub
Support finetuning GLM4v with zero2
☆16Jun 29, 2024Updated 2 years ago
fawazsammani / awesome-xai
View on GitHub
Papers about Explainable AI (Deep Learning-based)
☆29Nov 14, 2025Updated 8 months ago
TencentARC / FLM
View on GitHub
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
☆31May 15, 2023Updated 3 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
saibr / hypvl
View on GitHub
This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), https…
☆21Jul 5, 2024Updated 2 years ago
Aymanbegh / CD-COCO
View on GitHub
☆17Nov 30, 2023Updated 2 years ago
iOPENCap / awesome-unimodal-training
View on GitHub
text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
☆13Oct 15, 2024Updated last year
UCSB-AI / ComCLIP
View on GitHub
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆37Aug 18, 2024Updated last year
TencentARC / pi-Tuning
View on GitHub
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.
☆33Jul 21, 2023Updated 3 years ago
enrico310786 / brain_tumor_classification
View on GitHub
Brain tumor images classification with ResNet, EfficientNet, EfficientNet_V2 and Compact Convolutional Transformers architectures with Py…
☆11Jan 5, 2023Updated 3 years ago
Jiaxuan-Li / EVCap
View on GitHub
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆64Apr 8, 2024Updated 2 years ago
YeeZ93 / Awesome-Object-Centric-Learning
View on GitHub
A curated list of researches in object-centric learning
☆11Oct 14, 2024Updated last year
val-iisc / VL2V-ADiP
View on GitHub
[CVPR 2024] Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
☆43Mar 6, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
aimagelab / pacscore
View on GitHub
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆66Jul 29, 2025Updated last year
quangvnai / grit
View on GitHub
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
☆199May 9, 2023Updated 3 years ago
DeepExperience / REAL
View on GitHub
Rewards as Labels: Revisiting RLVR from a Classification Perspective
☆24Jun 26, 2026Updated last month
roymiles / ITRD
View on GitHub
[BMVC 2022] Information Theoretic Representation Distillation
☆19Oct 6, 2023Updated 2 years ago
baaaad / ECE
View on GitHub
[ECCV'22 Poster] Explicit Image Caption Editing
☆22Nov 30, 2022Updated 3 years ago
bearcatt / LaBERT
View on GitHub
A length-controllable and non-autoregressive image captioning model.
☆69Jun 10, 2021Updated 5 years ago
Victorwz / VaLM
View on GitHub
VaLM: Visually-augmented Language Modeling. ICLR 2023.
☆56Mar 6, 2023Updated 3 years ago