mertyg / vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
☆276Updated last year
Alternatives and similar repositories for vision-language-models-are-bows:
Users that are interested in vision-language-models-are-bows are comparing it to the libraries listed below
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆79Updated last year
- official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"☆208Updated 5 months ago
- ☆328Updated last year
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆278Updated last year
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆206Updated last year
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆276Updated last year
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆272Updated 6 months ago
- ☆168Updated last year
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆281Updated 5 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆110Updated 7 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆79Updated 6 months ago
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆187Updated last year
- Visualizing the attention of vision-language models☆165Updated 2 months ago
- ☆144Updated 6 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆83Updated last year
- up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources☆125Updated 3 weeks ago
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆319Updated 9 months ago
- PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)☆206Updated 2 years ago
- SVIT: Scaling up Visual Instruction Tuning☆163Updated 10 months ago
- Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)☆165Updated last year
- Densely Captioned Images (DCI) dataset repository.☆180Updated 10 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆154Updated 7 months ago
- ☆188Updated last year
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆45Updated last year
- A collection of visual instruction tuning datasets.☆76Updated last year
- All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)☆162Updated 8 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆338Updated 3 months ago
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆87Updated last year
- An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation☆117Updated last year
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆146Updated last year