zhiyuanhubj / Long_form_VideoQALinks
[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering
☆18Updated last year
Alternatives and similar repositories for Long_form_VideoQA
Users that are interested in Long_form_VideoQA are comparing it to the libraries listed below
Sorting:
- [ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.☆14Updated last year
- [AAAI’24 Main] READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Vi…☆10Updated last year
- Visual and Embodied Concepts evaluation benchmark☆21Updated 2 years ago
- Code for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context☆18Updated last year
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆18Updated last year
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆58Updated last year
- ☆67Updated 2 years ago
- ☆11Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Updated last year
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆25Updated last year
- FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models☆32Updated 2 months ago
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆85Updated last year
- ☆43Updated 2 years ago
- [EMNLP'22] Weakly-Supervised Temporal Article Grounding☆14Updated 2 years ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆34Updated 2 years ago
- Source code for InBedder, an instruction-following text embedder☆30Updated last year
- ☆68Updated 2 years ago
- Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"☆34Updated last year
- Recent Advances in Visual Dialog☆30Updated 3 years ago
- mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating☆98Updated 2 years ago
- ☆17Updated last year
- Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022☆31Updated 2 years ago
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆43Updated 7 months ago
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Updated 2 years ago
- [2025-TMLR] A Survey on the Honesty of Large Language Models☆64Updated last year
- ☆18Updated last year
- [EMNLP'2023 Findings] MoqaGPT, for zero-shot multimodal question answering with LLMs☆13Updated last year
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆68Updated 8 months ago
- Official Repo for FoodieQA paper (EMNLP 2024)☆19Updated 7 months ago
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆37Updated 2 years ago