bowen-upenn / Multi-Agent-VQALinks

[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering

☆20

Alternatives and similar repositories for Multi-Agent-VQA

Users that are interested in Multi-Agent-VQA are comparing it to the libraries listed below

Sorting:

Shengcao-Cao / groundLMM
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆41Updated last month
lixinustc / GraphAdapter
The efficient tuning method for VLMs
☆80Updated last year
SooLab / DDCOT
[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
☆49Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆89Updated last year
mrwu-mac / R-Bench
[ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'
☆22Updated 11 months ago
si0wang / ViCrit
☆24Updated 5 months ago
Letian2003 / C-VQA
Counterfactual Reasoning VQA Dataset
☆27Updated 2 years ago
om-ai-lab / ZoomEye
[EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
☆62Updated 2 weeks ago
CVMI-Lab / clip-beyond-tail
(NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights
☆29Updated last year
Hxyou / IdealGPT
Official Code of IdealGPT
☆35Updated 2 years ago
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆41Updated 7 months ago
StanfordMIMI / villa
[ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data
☆46Updated 2 years ago
wjpoom / SPEC
[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"
☆49Updated 5 months ago
heliossun / SQ-LLaVA
Visual self-questioning for large vision-language assistant.
☆45Updated 4 months ago
jmiemirza / Meta-Prompting
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs (ECCV 2024)
☆19Updated last year
OpenGVLab / Multitask-Model-Selector
[NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector
☆37Updated last year
UMass-Embodied-AGI / CoVLM
[ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆46Updated 5 months ago
mlvlab / OVQA
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 20…
☆18Updated last year
rabiulcste / vqazero
visual question answering prompting recipes for large vision-language models
☆28Updated last year
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆81Updated last month
mlvlab / RALF
Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".
☆44Updated last year
waltonfuture / MM-UPT
[NeurIPS 2025] Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
☆69Updated last month
Ruiyang-061X / Awesome-MLLM-Uncertainty
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
☆56Updated 8 months ago
isekai-portal / Link-Context-Learning
☆100Updated last year
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆93Updated last year
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆37Updated last year
iancovert / locality-alignment
☆53Updated 10 months ago
archiki / RepARe
☆21Updated 2 years ago
YuxiXie / V-DPO
Preference Learning for LLaVA
☆56Updated last year
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆65Updated 5 months ago