[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
☆142Mar 25, 2023Updated 3 years ago
Alternatives and similar repositories for visual-spatial-reasoning
Users that are interested in visual-spatial-reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Jan 10, 2025Updated last year
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆71Feb 28, 2024Updated 2 years ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- Grounding Language Models for Compositional and Spatial Reasoning☆18Oct 26, 2022Updated 3 years ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆71May 2, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆17Feb 20, 2023Updated 3 years ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- [EMNLP 2021] Code and data for our paper "Visually Grounded Reasoning across Languages and Cultures"☆30Dec 30, 2021Updated 4 years ago
- VisualGPTScore for visio-linguistic reasoning☆27Oct 7, 2023Updated 2 years ago
- Benchmarking Multi-Image Understanding in Vision and Language Models☆12Jul 29, 2024Updated last year
- Code of the COLING22 paper "uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers"☆19Aug 17, 2022Updated 3 years ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆255Aug 21, 2025Updated 7 months ago
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…☆21Oct 24, 2024Updated last year
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆23Nov 8, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆58Apr 24, 2024Updated last year
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆362Jan 14, 2025Updated last year
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".☆120Oct 9, 2025Updated 5 months ago
- Composed Video Retrieval☆63May 2, 2024Updated last year
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆155Apr 30, 2024Updated last year
- [NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"☆317Dec 14, 2024Updated last year
- Compose multimodal datasets 🎹☆551Jan 5, 2026Updated 2 months ago
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆35Mar 24, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".☆55Jan 28, 2024Updated 2 years ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆69Jun 9, 2024Updated last year
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆296Mar 13, 2024Updated 2 years ago
- MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning☆134Jun 20, 2023Updated 2 years ago
- ☆13May 10, 2025Updated 10 months ago
- ☆87Apr 15, 2022Updated 3 years ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆323Jan 20, 2025Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆281Jun 25, 2024Updated last year
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Nov 29, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- An official implementation for MS-DETR in ACL'23☆17Jun 3, 2023Updated 2 years ago
- Multimodal language model benchmark, featuring challenging examples☆186Dec 18, 2024Updated last year
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆28Oct 30, 2024Updated last year
- ☆59Aug 7, 2023Updated 2 years ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆131Apr 4, 2025Updated 11 months ago