cambridgeltl / visual-spatial-reasoningView external linksLinks
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
☆139Mar 25, 2023Updated 2 years ago
Alternatives and similar repositories for visual-spatial-reasoning
Users that are interested in visual-spatial-reasoning are comparing it to the libraries listed below
Sorting:
- ☆12Jan 10, 2025Updated last year
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆70Feb 28, 2024Updated last year
- Benchmarking Multi-Image Understanding in Vision and Language Models☆12Jul 29, 2024Updated last year
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- VisualGPTScore for visio-linguistic reasoning☆27Oct 7, 2023Updated 2 years ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆68May 2, 2025Updated 9 months ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆22Nov 8, 2023Updated 2 years ago
- Composed Video Retrieval☆62May 2, 2024Updated last year
- Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".☆118Oct 9, 2025Updated 4 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆247Aug 21, 2025Updated 5 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.☆360Jan 14, 2025Updated last year
- Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".☆55Jan 28, 2024Updated 2 years ago
- Grounding Language Models for Compositional and Spatial Reasoning☆18Oct 26, 2022Updated 3 years ago
- ☆58Apr 24, 2024Updated last year
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Nov 29, 2023Updated 2 years ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆155Apr 30, 2024Updated last year
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Nov 10, 2024Updated last year
- [EMNLP 2021] Code and data for our paper "Visually Grounded Reasoning across Languages and Cultures"☆30Dec 30, 2021Updated 4 years ago
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆281Jun 25, 2024Updated last year
- ☆360Jan 27, 2024Updated 2 years ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆68Jun 9, 2024Updated last year
- ☆17Feb 20, 2023Updated 2 years ago
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…☆18Oct 24, 2024Updated last year
- Official This-Is-My Dataset published in CVPR 2023☆16Jul 18, 2024Updated last year
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning☆296Mar 13, 2024Updated last year
- Official PyTorch code of GroundVQA (CVPR'24)☆64Sep 13, 2024Updated last year
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆34Mar 24, 2025Updated 10 months ago
- Compose multimodal datasets 🎹☆545Jan 5, 2026Updated last month
- Sapsucker Woods 60 Audiovisual Dataset☆17Oct 7, 2022Updated 3 years ago
- Official Implementation of "Fine-Tuning is Fine, if Calibrated.", NeurIPS 2024☆20Apr 25, 2025Updated 9 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- Code of the COLING22 paper "uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers"☆19Aug 17, 2022Updated 3 years ago
- ☆43Mar 8, 2021Updated 4 years ago
- ☆86Apr 15, 2022Updated 3 years ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆322Jan 20, 2025Updated last year
- Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"☆19Feb 14, 2025Updated last year
- If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions☆17Apr 4, 2024Updated last year