csebuetnlp / IllusionVQALinks
This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
☆22Updated 6 months ago
Alternatives and similar repositories for IllusionVQA
Users that are interested in IllusionVQA are comparing it to the libraries listed below
Sorting:
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆147Updated last month
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆64Updated last year
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆29Updated 4 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆77Updated 10 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- [CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning☆31Updated 5 months ago
- [ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduc…☆16Updated 8 months ago
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆126Updated last year
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆67Updated 6 months ago
- Matryoshka Multimodal Models☆114Updated 9 months ago
- ☆49Updated 2 years ago
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"☆49Updated 4 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆36Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆88Updated last year
- Preference Learning for LLaVA☆52Updated 11 months ago
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆159Updated last month
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆56Updated 2 years ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆27Updated last year
- A Comprehensive Benchmark for Robust Multi-image Understanding☆15Updated last year
- ☆46Updated 2 years ago
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆43Updated last month
- Official repository of paper "Subobject-level Image Tokenization" (ICML-25)☆88Updated 4 months ago
- Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)☆134Updated 5 months ago
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆86Updated last year
- ☆16Updated last year
- ☆23Updated last year
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆77Updated last month
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models☆78Updated 5 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆41Updated 5 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆47Updated 8 months ago