jiazhen-code / PhD
A Prompted Visual Hallucination Evaluation Dataset, featuring over 100,000 data points and four advanced evaluation modes. The dataset includes extensive contextual descriptions, counterintuitive images, and clear indicators of hallucination elements.
☆11Updated last month
Alternatives and similar repositories for PhD:
Users that are interested in PhD are comparing it to the libraries listed below
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆57Updated 5 months ago
- [ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives☆35Updated 4 months ago
- Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”☆48Updated 2 years ago
- Video Graph Transformer for Video Question Answering (ECCV'22)☆46Updated last year
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆16Updated 4 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆36Updated 8 months ago
- Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)☆34Updated 2 years ago
- The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS…☆22Updated 8 months ago
- [Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆56Updated 2 weeks ago
- [SIGIR 2024] - Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval☆30Updated 6 months ago
- [Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations☆126Updated 6 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆63Updated 6 months ago
- The code of the paper of "A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval" accepted b…☆19Updated last year
- ☆88Updated 2 years ago
- [CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…☆52Updated 6 months ago
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆87Updated 11 months ago
- VQACL: A Novel Visual Question Answering Continual Learning Setting (CVPR'23)☆33Updated 9 months ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆77Updated last month
- NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media, EMNLP 2021☆38Updated 4 months ago
- Official repository for the A-OKVQA dataset☆69Updated 8 months ago
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆50Updated 6 months ago
- ☆14Updated last year
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024☆28Updated 9 months ago
- ☆34Updated 2 years ago
- Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval…☆42Updated last year
- This is a summary of research on noisy correspondence. There may be omissions. If anything is missing please get in touch with us. Our em…☆50Updated last week
- [Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection☆25Updated 11 months ago
- Dynamic Modality Interaction Modeling for Image-Text Retrieval. SIGIR'21☆67Updated 2 years ago
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆73Updated 9 months ago