jiazhen-code / PhD
A Prompted Visual Hallucination Evaluation Dataset, featuring over 100,000 data points and four advanced evaluation modes. The dataset includes extensive contextual descriptions, counterintuitive images, and clear indicators of hallucination elements.
☆11Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for PhD
- [Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆40Updated 2 weeks ago
- [ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives☆34Updated 2 months ago
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆48Updated 3 months ago
- ☆11Updated 11 months ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆69Updated 6 months ago
- This is a summary of research on noisy correspondence. There may be omissions. If anything is missing please get in touch with us. Our em…☆44Updated last month
- ☆34Updated 2 years ago
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆79Updated 9 months ago
- ☆14Updated last year
- VQACL: A Novel Visual Question Answering Continual Learning Setting (CVPR'23)☆31Updated 7 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆41Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆58Updated 4 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆35Updated 3 weeks ago
- ☆33Updated 11 months ago
- Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)☆31Updated 2 years ago
- This repo contains code for Invariant Grounding for Video Question Answering☆26Updated last year
- [Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations☆114Updated 5 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆34Updated 6 months ago
- ☆83Updated 2 years ago
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)☆27Updated last year
- [NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models☆34Updated 8 months ago
- Official code for our paper "Model Composition for Multimodal Large Language Models"☆18Updated 6 months ago
- NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media, EMNLP 2021☆34Updated 2 months ago
- The official implementation of paper "Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval" accepted by NeurIPS…☆21Updated 6 months ago
- [EACL'23] COVID-VTS: Fact Extraction and Verification on Short Video Platforms☆9Updated last year
- ☆24Updated 4 months ago
- SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection☆32Updated 3 months ago
- ☆76Updated last month
- The code of the paper "Negative Pre-aware for Noisy Cross-modal Matching" in AAAI 2024.☆19Updated 6 months ago