ovguyo / captions-in-VQA
Using image captions with LLM for zero-shot VQA
☆15Updated 11 months ago
Alternatives and similar repositories for captions-in-VQA:
Users that are interested in captions-in-VQA are comparing it to the libraries listed below
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆38Updated 11 months ago
- Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…☆10Updated 8 months ago
- VQACL: A Novel Visual Question Answering Continual Learning Setting (CVPR'23)☆33Updated 11 months ago
- Official Code of IdealGPT☆34Updated last year
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆22Updated 7 months ago
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆31Updated 10 months ago
- Implementation of our paper, Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination..☆17Updated last year
- Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)☆67Updated last month
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆24Updated 2 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆56Updated 8 months ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆43Updated 6 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆19Updated 5 months ago
- Code and Dataset for the paper "LAMM: Label Alignment for Multi-Modal Prompt Learning" AAAI 2024☆32Updated last year
- Code for Label Propagation for Zero-shot Classification with Vision-Language Models (CVPR2024)☆35Updated 7 months ago
- (ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning☆27Updated 5 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆38Updated 2 months ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆13Updated 7 months ago
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆76Updated 10 months ago
- Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 202…☆59Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆13Updated 7 months ago
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆29Updated 10 months ago
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆35Updated 11 months ago
- [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives☆28Updated 4 months ago
- [CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering☆10Updated 5 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆44Updated 7 months ago
- ☆29Updated 7 months ago
- The efficient tuning method for VLMs☆78Updated 11 months ago
- A simple pytorch implementation of baseline based-on CLIP for Image-text Matching.☆13Updated last year
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆31Updated 4 months ago
- Implementation of our paper, 'Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval.'☆22Updated last year