nguyentthong / video-language-understanding
[ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
☆35Updated 4 months ago
Alternatives and similar repositories for video-language-understanding:
Users that are interested in video-language-understanding are comparing it to the libraries listed below
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆139Updated 8 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆194Updated 9 months ago
- ☆25Updated 6 months ago
- ☆29Updated last year
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆77Updated 9 months ago
- Official repository for the A-OKVQA dataset☆69Updated 8 months ago
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆16Updated 3 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆43Updated 6 months ago
- A Prompted Visual Hallucination Evaluation Dataset, featuring over 100,000 data points and four advanced evaluation modes. The dataset in…☆11Updated last month
- Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval…☆42Updated last year
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)☆139Updated 5 months ago
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆50Updated 6 months ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆77Updated last month
- Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”☆48Updated 2 years ago
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆73Updated 9 months ago
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆35Updated 4 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆36Updated 8 months ago
- ☆38Updated last year
- ☆64Updated 5 years ago
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆57Updated 5 months ago
- 😎 curated list of awesome LMM hallucinations papers, methods & resources.☆147Updated 9 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆63Updated 6 months ago
- ☆11Updated last week
- [Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆57Updated 2 weeks ago
- ☆88Updated 2 years ago
- Official github repo for ICCV2023 paper 'Multi-event Video-Text Retrieval'☆18Updated 11 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆40Updated 2 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆44Updated last year
- [CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…☆52Updated 6 months ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆43Updated 9 months ago