hughplay / Visual-Reasoning-Papers
π A curated list of visual reasoning papers.
β26Updated last month
Alternatives and similar repositories for Visual-Reasoning-Papers
Users that are interested in Visual-Reasoning-Papers are comparing it to the libraries listed below
Sorting:
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimizationβ87Updated last year
- β97Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuningβ83Updated last year
- Counterfactual Reasoning VQA Datasetβ25Updated last year
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.β66Updated 11 months ago
- Code and datasets for "Whatβs βupβ with vision-language models? Investigating their struggle with spatial reasoning".β48Updated last year
- Official implementation for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decodingβ45Updated last year
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervisionβ41Updated last month
- Official repository for the A-OKVQA datasetβ84Updated last year
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervisionβ62Updated 10 months ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diβ¦β51Updated 6 months ago
- π₯ [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"β15Updated 3 months ago
- A curated list of researches in object-centric learningβ11Updated 7 months ago
- β69Updated 5 months ago
- β65Updated 10 months ago
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scaleβ43Updated 5 months ago
- β41Updated 4 months ago
- Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR β¦β276Updated last year
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".β58Updated last year
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'β20Updated 4 months ago
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)β56Updated last year
- Official Code of IdealGPTβ35Updated last year
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".β78Updated 2 years ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''β207Updated last year
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β96Updated 6 months ago
- [CVPR 2022] Visual Abductive Reasoningβ122Updated 6 months ago
- Official Code for ACL 2023 Outstanding Paper: World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Languagβ¦β32Updated last year
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Modelsβ77Updated this week
- β43Updated 3 weeks ago
- β75Updated 4 months ago