zhiyuanhubj / Long_form_VideoQALinks

[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering

☆19

Alternatives and similar repositories for Long_form_VideoQA

Users that are interested in Long_form_VideoQA are comparing it to the libraries listed below

Sorting:

MrZilinXiao / AutoVER
[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.
☆14Updated last year
THUNLP-MT / CODIS
Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".
☆12Updated 9 months ago
edchengg / infoseek_eval
EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions
☆25Updated last year
open-vision-language / infoseek
☆58Updated last year
yale-nlp / TOMATO
☆28Updated 9 months ago
michelecafagna26 / VinVL
Original VinVL (and Oscar) repo with API designed for an easy inference
☆8Updated 2 years ago
bcdnlp / FAITHSCORE
FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
☆30Updated 4 months ago
THUNLP-MT / Brote
☆11Updated 6 months ago
yuezih / less-is-more
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆55Updated 9 months ago
claws-lab / projection-in-MLLMs
Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'
☆16Updated last year
X-PLUG / mPLUG-HalOwl
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
☆96Updated last year
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆80Updated 9 months ago
open-vision-language / oven
☆39Updated last year
lscpku / VITATECS
☆18Updated last year
Yangyi-Chen / CoTConsistency
The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".
☆33Updated last year
phellonchen / awesome-visual-dialog
Recent Advances in Visual Dialog
☆30Updated 2 years ago
YiyangZhou / LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆147Updated last year
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆43Updated 2 months ago
ForJadeForest / Lever-LM
The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
☆16Updated 10 months ago
MikeWangWZHL / Paxion
Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight
☆37Updated 2 years ago
limanling / KnowledgeVL-Reading
☆68Updated 2 years ago
muirbench / MuirBench
A Comprehensive Benchmark for Robust Multi-image Understanding
☆12Updated 11 months ago
nguyentthong / video-language-understanding
[ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
☆40Updated last month
WilliamZR / ProTrix
Code for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context
☆18Updated 8 months ago
DAMO-NLP-SG / CMM
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆46Updated last month
guilk / KAT
Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"
☆66Updated 3 years ago
PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆135Updated 2 years ago
zjuchenlong / WSAG
[EMNLP'22] Weakly-Supervised Temporal Article Grounding
☆14Updated last year
shiqichen17 / AdaptVis
Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)
☆42Updated 3 months ago
ajd12342 / why-winoground-hard
Code for 'Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality', EMNLP 2022
☆30Updated 2 years ago