nguyentthong / video-language-understanding
[ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
☆34Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for video-language-understanding
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆34Updated 6 months ago
- A Prompted Visual Hallucination Evaluation Dataset, featuring over 100,000 data points and four advanced evaluation modes. The dataset in…☆11Updated 2 weeks ago
- ☆83Updated 2 years ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆41Updated last year
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆69Updated 6 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆58Updated 4 months ago
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)☆131Updated 3 months ago
- ☆24Updated 4 months ago
- ☆76Updated last month
- ☆11Updated 4 months ago
- [CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…☆51Updated 4 months ago
- [Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆40Updated 2 weeks ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆182Updated 7 months ago
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆79Updated 9 months ago
- Official repository for the A-OKVQA dataset☆64Updated 6 months ago
- This repo contains code for Invariant Grounding for Video Question Answering☆26Updated last year
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆41Updated 4 months ago
- ☆11Updated 11 months ago
- ☆63Updated 5 years ago
- This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World"…☆44Updated 8 months ago
- Official github repo for ICCV2023 paper 'Multi-event Video-Text Retrieval'☆18Updated 9 months ago
- 😎 up-to-date & curated list of awesome LMM hallucinations papers, methods & resources.☆146Updated 8 months ago
- Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos☆20Updated 4 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆42Updated 4 months ago
- VQACL: A Novel Visual Question Answering Continual Learning Setting (CVPR'23)☆31Updated 7 months ago
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆68Updated 7 months ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆35Updated 3 weeks ago
- Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”☆47Updated 2 years ago
- (ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning☆22Updated last month