mertyg / vision-language-models-are-bowsLinks

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

☆280

Alternatives and similar repositories for vision-language-models-are-bows

Users that are interested in vision-language-models-are-bows are comparing it to the libraries listed below

Sorting:

RAIVNLab / sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆82Updated last year
RUCAIBox / POPE
The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆216Updated last year
FuxiaoLiu / LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆284Updated last year
allenai / aokvqa
Official repository for the A-OKVQA dataset
☆96Updated last year
facebookresearch / DCI
Densely Captioned Images (DCI) dataset repository.
☆187Updated last year
tsb0601 / MMVP
☆344Updated last year
yossigandelsman / clip_text_span
official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"
☆217Updated 2 months ago
YiyangZhou / LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆147Updated last year
LijieFan / LaCLIP
[NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"
☆283Updated last year
vinid / neg_clip
NegCLIP.
☆34Updated 2 years ago
Weixin-Liang / Modality-Gap
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning
☆158Updated 2 years ago
sachit-menon / classify_by_description_release
☆168Updated last year
PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆135Updated 2 years ago
ylsung / VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆205Updated 2 years ago
allenai / close
☆59Updated last year
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆87Updated 9 months ago
hendryx-scale / mhal-detect
M-HalDetect Dataset Release
☆25Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆86Updated last year
AoiDragon / POPE
[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆87Updated last year
tianyi-lab / HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…
☆293Updated 8 months ago
LisaAnne / Hallucination
☆75Updated 6 years ago
Computer-Vision-in-the-Wild / Elevater_Toolkit_IC
Toolkit for Elevater Benchmark
☆73Updated last year
Yui010206 / SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆187Updated last year
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆43Updated last month
yfzhang114 / LLaVA-Align
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆81Updated 5 months ago
ys-zong / VL-ICL
[ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
☆62Updated 5 months ago
TRI-ML / vlm-evaluation
VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
☆120Updated 10 months ago
sarahpratt / CuPL
☆191Updated 2 years ago
LAION-AI / scaling-laws-openclip
Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)
☆171Updated last month
IntelLabs / lvlm-interpret
☆95Updated 4 months ago