hughplay / Visual-Reasoning-PapersLinks

📄 A curated list of visual reasoning papers.

☆28

Alternatives and similar repositories for Visual-Reasoning-Papers

Users that are interested in Visual-Reasoning-Papers are comparing it to the libraries listed below

Sorting:

yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆69Updated last year
sled-group / world-to-words
Official Code for ACL 2023 Outstanding Paper: World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Languag…
☆32Updated last year
Letian2003 / C-VQA
Counterfactual Reasoning VQA Dataset
☆25Updated last year
JieyuZ2 / TaskMeAnything
[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.
☆71Updated 7 months ago
cambridgeltl / visual-spatial-reasoning
[TACL'23] VSR: A probing benchmark for spatial undersranding of vision-language models.
☆127Updated 2 years ago
isekai-portal / Link-Context-Learning
☆98Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆86Updated last year
visual-haystacks / mirage
🔥 [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
☆16Updated 5 months ago
AoiDragon / POPE
[EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''
☆85Updated last year
amitakamath / whatsup_vlms
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
☆54Updated last year
orrzohar / Video-STaR
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆65Updated last year
lupantech / IconQA
Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".
☆52Updated last year
ChenYi99 / EgoPlan
☆70Updated 7 months ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆130Updated last year
Gabesarch / grounded-rl
☆63Updated this week
2toinf / IVM
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
☆35Updated 8 months ago
YiyangZhou / LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆146Updated last year
si0wang / VisVM
☆45Updated 6 months ago
UMass-Embodied-AGI / CoVLM
[ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆45Updated last month
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]
☆71Updated 2 weeks ago
ranpox / openreview-visualization
OpenReivew Submission Visualization (ICLR 2024/2025)
☆151Updated 9 months ago
Lizw14 / Super-CLEVR
Code for paper "Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning"
☆42Updated last year
allenai / aokvqa
Official repository for the A-OKVQA dataset
☆95Updated last year
Hxyou / IdealGPT
Official Code of IdealGPT
☆35Updated last year
CeeZh / LLoVi
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
☆97Updated 8 months ago
amazon-science / QA-ViT
☆69Updated last year
pkunlp-icler / PCA-EVAL
[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
☆105Updated last year
cliangyu / Cola
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆105Updated last year
IntelLabs / lvlm-interpret
☆90Updated 3 months ago
kkahatapitiya / LangRepo
Language Repository for Long Video Understanding
☆31Updated last year