HAWLYQ / InfoMetICLinks

☆13

Alternatives and similar repositories for InfoMetIC

Users that are interested in InfoMetIC are comparing it to the libraries listed below

Sorting:

JiwanChung / vlis
☆24Updated 2 years ago
SivanDoveh / DAC
Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models
☆27Updated 2 years ago
aimagelab / pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆64Updated 4 months ago
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆43Updated 5 months ago
Yushi-Hu / PromptCap
natual language guided image captioning
☆86Updated last year
joeyz0z / ConZIC
Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
☆75Updated 2 years ago
bcdnlp / FAITHSCORE
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
☆30Updated 8 months ago
cwj1412 / MSCOCO-Flikcr30K_FG
Benchmark data for "Rethinking Benchmarks for Cross-modal Image-text Retrieval" (SIGIR 2023)
☆25Updated 2 years ago
jmhessel / pycocoevalcap
Python 3 support for the MS COCO caption evaluation tools
☆14Updated last year
yangbang18 / MultiCapCLIP
(ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
☆36Updated last year
PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆134Updated 2 years ago
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆37Updated last year
jungokasai / THumB
☆15Updated 3 years ago
allenai / aokvqa
Official repository for the A-OKVQA dataset
☆104Updated last year
buxiangzhiren / DDCap
☆85Updated 2 years ago
jianjieluo / SCD-Net
[CVPR23] A cascaded diffusion captioning model with a novel semantic-conditional diffusion process that upgrades conventional diffusion m…
☆67Updated last year
Heidelberg-NLP / VALSE
Data repository for the VALSE benchmark.
☆37Updated last year
vinid / neg_clip
NegCLIP.
☆38Updated 2 years ago
YujieLu10 / LLMScore
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
☆134Updated 2 years ago
LgQu / TIGeR
Code for paper: Unified Text-to-Image Generation and Retrieval
☆16Updated last year
hsiehjackson / Mr.Right
Mr. Right: Multimodal Retrieval on Representation of ImaGe witH Text
☆24Updated 3 years ago
Hxyou / IdealGPT
Official Code of IdealGPT
☆35Updated 2 years ago
96-Zachary / vse_2ad
☆16Updated 3 years ago
PaulLerner / ViQuAE
Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retriev…
☆38Updated 11 months ago
zmykevin / UVLP
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
☆22Updated 3 years ago
DavidHuji / CapDec
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
☆202Updated last year
codezakh / LilT
[ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning
☆40Updated 2 years ago
google-deepmind / svo_probes
The SVO-Probes Dataset for Verb Understanding
☆31Updated 3 years ago
open-vision-language / oven
☆40Updated 2 years ago
keio-smilab24 / Polos
[CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
☆32Updated 6 months ago