Maitreyapatel / CRIPP-VQALinks
CRIPP-VQA Benchmark -- EMNLP, 2022
☆9Updated 2 years ago
Alternatives and similar repositories for CRIPP-VQA
Users that are interested in CRIPP-VQA are comparing it to the libraries listed below
Sorting:
- [DMLR 2024] Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift☆37Updated last year
- ☆68Updated 2 years ago
- Differentiable First-Order Logic Reasoning for Visual Question Answering☆40Updated 4 years ago
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆30Updated 2 years ago
- This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).☆37Updated last year
- ☆16Updated 3 months ago
- [TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.☆127Updated 2 years ago
- Code for ICCV2021 paper: Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images☆14Updated 2 years ago
- ☆75Updated 6 years ago
- Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution☆26Updated 4 years ago
- ☆39Updated 2 years ago
- The Continual Learning in Multimodality Benchmark☆67Updated 2 years ago
- [Findings of EMNLP 2022] AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant☆23Updated last year
- Code for WACV 2021 Paper "Meta Module Network for Compositional Visual Reasoning"☆43Updated 4 years ago
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated 2 years ago
- GQA-OOD is a new dataset and benchmark for the evaluation of VQA models in OOD (out of distribution) settings.☆29Updated 4 years ago
- Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling☆36Updated 3 years ago
- Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal tra…☆90Updated 2 years ago
- An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA, AAAI 2022 (Oral)☆85Updated 3 years ago
- Repo for ICCV 2021 paper: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering☆27Updated last year
- Transformation Driven Visual Reasoning - CVPR 2021☆37Updated 2 years ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Updated 2 years ago
- Official code repo for "ProTo: program-guided Transformers for Program-guided Tasks☆21Updated 3 years ago
- [ICLR2024] (EvALign-ICL Benchmark) Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context …☆22Updated last year
- Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners☆115Updated 2 years ago
- The SVO-Probes Dataset for Verb Understanding☆31Updated 3 years ago
- Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".☆52Updated last year
- Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"☆43Updated 3 years ago
- Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)☆34Updated 2 years ago
- Code, data, models for the Sherlock corpus☆58Updated 2 years ago