GaryJiajia / OFv2_ICL_VQA
[CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering
☆17Updated 6 months ago
Alternatives and similar repositories for OFv2_ICL_VQA:
Users that are interested in OFv2_ICL_VQA are comparing it to the libraries listed below
- The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models☆14Updated 5 months ago
- ☆14Updated last year
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆36Updated 3 months ago
- VQACL: A Novel Visual Question Answering Continual Learning Setting (CVPR'23)☆35Updated 11 months ago
- [CVPR25] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced evaluation mod…☆13Updated 3 weeks ago
- SotA text-only image/video method (IJCAI 2023)☆16Updated last year
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆35Updated 11 months ago
- Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”☆48Updated 2 years ago
- ☆16Updated last year
- ☆11Updated last year
- Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)☆34Updated 2 years ago
- Repository for an end-to-end image captioning method PTSN(ACM MM22).☆61Updated 2 years ago
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆41Updated last year
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆63Updated 7 months ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆48Updated 11 months ago
- the code for paper: A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering☆12Updated last year
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆47Updated last year
- [CVPR'2022 Oral] The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation☆31Updated last year
- Official repository for the A-OKVQA dataset☆79Updated 10 months ago
- The code of the paper of "A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval" accepted b…☆19Updated last year
- ☆20Updated 11 months ago
- [Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations☆134Updated 9 months ago
- ☆11Updated last year
- ☆20Updated 2 years ago
- Official implementation for the MM'22 paper.☆12Updated 2 years ago
- Code for WACV 2023 paper "VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge"☆21Updated last year
- This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World"…☆47Updated last year
- A comprehensive survey of Composed Multi-modal Retrieval (CMR), including Composed Image Retrieval (CIR) and Composed Video Retrieval (CV…☆20Updated 3 weeks ago
- [EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.☆55Updated last week
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆66Updated 8 months ago