keio-smilab24 / PolosLinks

[CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning

☆31

Alternatives and similar repositories for Polos

Users that are interested in Polos are comparing it to the libraries listed below

Sorting:

UW-Madison-Lee-Lab / CoBSAT
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
☆41Updated 4 months ago
naver-ai / prolip
☆53Updated last month
jmerullo / limber
https://arxiv.org/abs/2209.15162
☆52Updated 2 years ago
codezakh / LilT
[ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning
☆40Updated 2 years ago
k1rezaei / Text-to-concept
☆35Updated last year
facebookresearch / genecis
Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"
☆60Updated 2 years ago
sIncerass / MVLPT
code for "Multitask Vision-Language Prompt Tuning" https://arxiv.org/abs/2211.11720
☆57Updated last year
kaist-ami / BEAF
[ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"
☆20Updated 6 months ago
saibr / hypvl
This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), https…
☆20Updated last year
yonatanbitton / wysiwyr
☆37Updated 2 years ago
ZhangYuanhan-AI / visual_prompt_retrieval
[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"
☆177Updated last year
SivanDoveh / DAC
Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models
☆27Updated last year
NVlabs / PALAVRA
☆53Updated 3 years ago
arijitray1993 / COLA
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆24Updated 10 months ago
aimagelab / pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆64Updated 2 months ago
kakaobrain / noc
☆46Updated last year
showlab / datacentric.vlp
Compress conventional Vision-Language Pre-training data
☆52Updated 2 years ago
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆32Updated 7 months ago
RAIVNLab / sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆86Updated last year
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆35Updated last year
YujieLu10 / LLMScore
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
☆132Updated last year
aszala / VPEval
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆44Updated last year
ExplainableML / WaffleCLIP
Official repository for the ICCV 2023 paper: "Waffling around for Performance: Visual Classification with Random Words and Broad Concepts…
☆59Updated 2 years ago
McGill-NLP / diffusion-itm
Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"
☆33Updated last year
hammoudhasan / SynthCLIP
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
☆100Updated 6 months ago
facebookresearch / OTTER
This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described …
☆69Updated 3 years ago
jmiemirza / MMFM-Challenge
Official repository for the MMFM challenge
☆25Updated last year
RAIVNLab / CREPE
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆34Updated 2 years ago
DavidMChan / caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
☆42Updated last year
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆43Updated 4 months ago