lupantech / IconQALinks

Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".

☆52

Alternatives and similar repositories for IconQA

Users that are interested in IconQA are comparing it to the libraries listed below

Sorting:

allenai / gpv2
☆32Updated 3 years ago
shizhediao / DaVinci
Source code for the paper "Prefix Language Models are Unified Modal Learners"
☆42Updated 2 years ago
guilk / VLC
Research code for "Training Vision-Language Transformers from Captions Alone"
☆34Updated 3 years ago
cambridgeltl / visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
☆131Updated 2 years ago
Victorwz / VaLM
VaLM: Visually-augmented Language Modeling. ICLR 2023.
☆56Updated 2 years ago
ajd12342 / why-winoground-hard
Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022
☆31Updated 2 years ago
limanling / KnowledgeVL-Reading
☆67Updated 2 years ago
allenai / grit_official
Official repository for the General Robust Image Task (GRIT) Benchmark
☆54Updated 2 years ago
guilk / KAT
Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"
☆68Updated 3 years ago
princetonvisualai / pointingqa
Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"
☆19Updated 3 years ago
sanjayss34 / codevqa
☆84Updated 2 years ago
mlfoundations / VisIT-Bench
☆50Updated 2 years ago
YujieLu10 / TIP
Multimodal-Procedural-Planning
☆92Updated 2 years ago
zmykevin / UVLP
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
☆22Updated 3 years ago
microsoft / UniTAB
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆89Updated 2 years ago
jmerullo / limber
https://arxiv.org/abs/2209.15162
☆52Updated 2 years ago
PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆134Updated 2 years ago
Hxyou / IdealGPT
Official Code of IdealGPT
☆35Updated 2 years ago
MikeWangWZHL / VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆115Updated 3 years ago
umd-huang-lab / Mementos
☆31Updated last year
google-deepmind / svo_probes
The SVO-Probes Dataset for Verb Understanding
☆31Updated 3 years ago
MichaelZhouwang / VLUE
This repo contains codes and instructions for baselines in the VLUE benchmark.
☆41Updated 3 years ago
pleaseconnectwifi / DANCE
PyTorch code for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles (DANCE)
☆23Updated 2 years ago
Yangyi-Chen / CoTConsistency
The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".
☆34Updated 2 years ago
NewsStoriesData / newsstories.github.io
☆22Updated 3 years ago
redcaps-dataset / redcaps-downloader
Command-line tool for downloading and extending the RedCaps dataset.
☆49Updated last year
HYPJUDY / Sparkles
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
☆44Updated last year
zinengtang / Perceiver_VL
PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)
☆33Updated 2 years ago
facebookresearch / diht
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
☆138Updated 2 years ago
manoja328 / TallyQA_dataset
TallyQA: Answering Complex Counting Questions dataset
☆27Updated last year