jiazhen-code / PhD
[CVPR25] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced evaluation modes. The dataset includes extensive contextual descriptions, counterintuitive images, and clear indicators of hallucination items.
☆16Updated last month
Alternatives and similar repositories for PhD
Users that are interested in PhD are comparing it to the libraries listed below
Sorting:
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆87Updated 5 months ago
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆70Updated 9 months ago
- 🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".☆35Updated 2 months ago
- [ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"☆51Updated 8 months ago
- ☆46Updated 5 months ago
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆17Updated 8 months ago
- 🎉 [ACL 2025] The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning…☆15Updated last week
- ☆119Updated 3 months ago
- [SIGIR 2024] - Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval☆35Updated 10 months ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆145Updated last year
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆43Updated last year
- [CVPR' 25] Interleaved-Modal Chain-of-Thought☆40Updated 3 weeks ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆49Updated last year
- Code for ICLR 2025 Paper: Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs☆14Updated 2 weeks ago
- [CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆33Updated 10 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆71Updated 10 months ago
- 😎 curated list of awesome LMM hallucinations papers, methods & resources.☆149Updated last year
- Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)☆89Updated 6 months ago
- A comprehensive survey of Composed Multi-modal Retrieval (CMR), including Composed Image Retrieval (CIR) and Composed Video Retrieval (CV…☆37Updated 2 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆83Updated last year
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆94Updated last year
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆81Updated 5 months ago
- A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.☆61Updated 2 months ago
- Official github repo for ICCV2023 paper 'Multi-event Video-Text Retrieval'☆18Updated last year
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆42Updated last year
- [EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.☆61Updated 2 months ago
- CHAIR metric is a rule-based metric for evaluating object hallucination in caption generation.☆29Updated last year
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆11Updated 10 months ago
- ☆9Updated last year
- The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS…☆26Updated last year