cambridgeltl / visual-spatial-reasoningLinks

[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.

☆130

Alternatives and similar repositories for visual-spatial-reasoning

Users that are interested in visual-spatial-reasoning are comparing it to the libraries listed below

Sorting:

allenai / aokvqa
Official repository for the A-OKVQA dataset
☆100Updated last year
limanling / KnowledgeVL-Reading
☆67Updated 2 years ago
MikeWangWZHL / Paxion
Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight
☆37Updated 2 years ago
shizhediao / DaVinci
Source code for the paper "Prefix Language Models are Unified Modal Learners"
☆42Updated 2 years ago
ajd12342 / why-winoground-hard
Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022
☆31Updated 2 years ago
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆88Updated last year
RAIVNLab / sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆86Updated last year
amitakamath / whatsup_vlms
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
☆62Updated last year
MikeWangWZHL / VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆115Updated 3 years ago
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆43Updated 4 months ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
bcdnlp / FAITHSCORE
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
☆30Updated 7 months ago
allenai / unified-io-inference
☆227Updated last year
RifleZhang / LLaVA-Hound-DPO
☆155Updated 11 months ago
PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆134Updated 2 years ago
UMass-Embodied-AGI / CoVLM
[ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆45Updated 4 months ago
LisaAnne / Hallucination
☆81Updated 6 years ago
google-deepmind / svo_probes
The SVO-Probes Dataset for Verb Understanding
☆31Updated 3 years ago
lupantech / IconQA
Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".
☆52Updated last year
mlfoundations / VisIT-Bench
☆50Updated last year
Hxyou / IdealGPT
Official Code of IdealGPT
☆35Updated 2 years ago
cliangyu / Cola
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆103Updated last year
YiyangZhou / LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆149Updated last year
pipilurj / bootstrapped-preference-optimization-BPO
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆59Updated last year
ylsung / VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)
☆207Updated 2 years ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆147Updated 3 weeks ago
ChenDelong1999 / polite-flamingo
🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
☆63Updated last year
microsoft / PICa
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA, AAAI 2022 (Oral)
☆85Updated 3 years ago
microsoft / UniTAB
UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)
☆88Updated 2 years ago
vlf-silkie / VLFeedback
☆100Updated last year