bowen-upenn / Multi-Agent-VQA
[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
☆10Updated 5 months ago
Alternatives and similar repositories for Multi-Agent-VQA:
Users that are interested in Multi-Agent-VQA are comparing it to the libraries listed below
- Official Code of IdealGPT☆34Updated last year
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆20Updated 2 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆31Updated 4 months ago
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆38Updated 11 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆24Updated 2 months ago
- Counterfactual Reasoning VQA Dataset☆24Updated last year
- visual question answering prompting recipes for large vision-language models☆24Updated 5 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated 3 months ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆18Updated 2 years ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆19Updated 5 months ago
- "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"☆15Updated last week
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆38Updated 2 months ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆13Updated 7 months ago
- [CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!☆15Updated 9 months ago
- [ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data☆39Updated last year
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆22Updated 7 months ago
- [ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning☆46Updated 3 weeks ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆63Updated 9 months ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples☆22Updated 3 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 6 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆46Updated last year
- SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models☆20Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆79Updated 10 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆78Updated 11 months ago
- Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…☆10Updated 8 months ago
- ☆17Updated last year
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆33Updated 3 months ago
- The efficient tuning method for VLMs☆78Updated 11 months ago
- PyTorch implementation of paper "FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training"☆22Updated this week