RAIVNLab / sugar-crepeLinks

[NeurIPS 2023] A faithful benchmark for vision-language compositionality

☆86

Alternatives and similar repositories for sugar-crepe

Users that are interested in sugar-crepe are comparing it to the libraries listed below

Sorting:

mertyg / vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …
☆286Updated 2 years ago
amitakamath / whatsup_vlms
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
☆62Updated last year
allenai / aokvqa
Official repository for the A-OKVQA dataset
☆100Updated last year
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆43Updated 4 months ago
RAIVNLab / CREPE
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆35Updated 2 years ago
ajd12342 / why-winoground-hard
Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022
☆31Updated 2 years ago
MikeWangWZHL / Paxion
Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight
☆37Updated 2 years ago
SivanDoveh / DAC
Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models
☆27Updated last year
vinid / neg_clip
NegCLIP.
☆37Updated 2 years ago
allenai / close
☆59Updated 2 years ago
google-deepmind / svo_probes
The SVO-Probes Dataset for Verb Understanding
☆31Updated 3 years ago
cambridgeltl / visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
☆130Updated 2 years ago
bcdnlp / FAITHSCORE
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
☆30Updated 7 months ago
yuhui-zh15 / drml
Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)
☆34Updated 2 years ago
ExplainableML / WaffleCLIP
Official repository for the ICCV 2023 paper: "Waffling around for Performance: Visual Classification with Random Words and Broad Concepts…
☆59Updated 2 years ago
Weixin-Liang / Modality-Gap
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
☆161Updated 3 years ago
saibr / hypvl
This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), https…
☆20Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆88Updated last year
yuhui-zh15 / C3
Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)
☆34Updated last year
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆91Updated last year
arijitray1993 / COLA
COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!
☆24Updated 11 months ago
mrwu-mac / R-Bench
[ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'
☆21Updated 9 months ago
lezhang7 / Enhance-FineGrained
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆51Updated 6 months ago
YiyangZhou / LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆149Updated last year
aimagelab / pacscore
[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
☆64Updated 2 months ago
LisaAnne / Hallucination
☆81Updated 6 years ago
goel-shashank / CyCLIP
☆120Updated 2 years ago
showlab / datacentric.vlp
Compress conventional Vision-Language Pre-training data
☆52Updated 2 years ago
ys-zong / VL-ICL
[ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
☆65Updated last month
kdariina / CLIP-not-BoW-unimodally
Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"
☆16Updated 8 months ago