zhiyuanhubj / Long_form_VideoQA
[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering
☆16Updated 5 months ago
Alternatives and similar repositories for Long_form_VideoQA:
Users that are interested in Long_form_VideoQA are comparing it to the libraries listed below
- [ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives☆37Updated 6 months ago
- Code for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context☆15Updated 3 months ago
- Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).☆28Updated 3 years ago
- Visual and Embodied Concepts evaluation benchmark☆21Updated last year
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated last year
- Code for the ACL 2022 paper "Continual Sequence Generation with Adaptive Compositional Modules"☆38Updated 2 years ago
- ☆25Updated 2 years ago
- ☆15Updated 2 months ago
- A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models (ACL 2022)☆41Updated 2 years ago
- Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"☆63Updated 2 years ago
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆30Updated last year
- Official repo for "Imagination-Augmented Natural Language Understanding", NAACL 2022.☆17Updated 2 years ago
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆19Updated 9 months ago
- This repository provides the data and the codes used in the AAAI'24 paper, COOPER: Coordinating Specialized Agents towards a Complex Dial…☆23Updated last year
- Official Repo for FoodieQA paper (EMNLP 2024)☆15Updated 3 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆66Updated 3 months ago
- ☆12Updated 2 months ago
- ☆41Updated last year
- ☆42Updated 4 months ago
- ☆21Updated 7 months ago
- [NeurIPS 2022 Workshop] A Case Study with Negated Prompts using T0 (3B, 11B), InstructGPT (350M-175B), GPT-3 (350M - 175B) & OPT (125M - …☆24Updated 2 years ago
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆63Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆11Updated 4 months ago
- This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimoda…☆26Updated 10 months ago
- Source code for InBedder, an instruction-following text embedder☆24Updated 4 months ago
- visual question answering prompting recipes for large vision-language models☆24Updated 5 months ago
- ☆11Updated 8 months ago
- ☆30Updated 10 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year