jiazhen-code / PhD
[CVPR25] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced evaluation modes. The dataset includes extensive contextual descriptions, counterintuitive images, and clear indicators of hallucination items.
☆14Updated last month
Alternatives and similar repositories for PhD:
Users that are interested in PhD are comparing it to the libraries listed below
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆63Updated 8 months ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆85Updated 3 months ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆145Updated 10 months ago
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆41Updated last year
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆17Updated 7 months ago
- ☆36Updated 2 years ago
- [SIGIR 2024] - Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval☆31Updated 8 months ago
- A comprehensive survey of Composed Multi-modal Retrieval (CMR), including Composed Image Retrieval (CIR) and Composed Video Retrieval (CV…☆20Updated 3 weeks ago
- Official github repo for ICCV2023 paper 'Multi-event Video-Text Retrieval'☆18Updated last year
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆80Updated 11 months ago
- This is a summary of research on noisy correspondence. There may be omissions. If anything is missing please get in touch with us. Our em…☆56Updated this week
- The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS…☆24Updated 10 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆76Updated 2 months ago
- ☆19Updated 8 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆66Updated 8 months ago
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024☆29Updated last year
- [ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"☆49Updated 6 months ago
- ☆107Updated last month
- ☆45Updated 4 months ago
- 🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".☆29Updated last week
- Instruction Tuning in Continual Learning paradigm☆44Updated last month
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆92Updated last year
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆27Updated 8 months ago
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆36Updated 4 months ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆44Updated last week
- [ICCV2023] - CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation☆31Updated 5 months ago
- Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…☆10Updated 3 months ago
- [ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives☆38Updated 7 months ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆48Updated 11 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆82Updated last year