THUNLP-MT / CODISLinks

Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".

☆12

Alternatives and similar repositories for CODIS

Users that are interested in CODIS are comparing it to the libraries listed below

Sorting:

michelecafagna26 / cider
Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming …
☆13Updated last year
THUNLP-MT / Brote
☆11Updated 8 months ago
zhiyuanhubj / Long_form_VideoQA
[EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering
☆18Updated last year
luka-group / CoIN
☆12Updated last year
ForJadeForest / Lever-LM
The Code for Lever LM: Configuring In-Context Sequence to Lever Large Vision Language Models
☆16Updated last year
assafbk / mocha_code
Mitigating Open-Vocabulary Caption Hallucinations (EMNLP 2024)
☆17Updated last year
shiqichen17 / AdaptVis
Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)
☆47Updated 5 months ago
DAMO-NLP-SG / CMM
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆48Updated 3 months ago
lyan62 / FoodieQA
Official Repo for FoodieQA paper (EMNLP 2024)
☆16Updated 3 months ago
WildVision-AI / LMM-Engines
☆17Updated 11 months ago
Kamichanw / ICLTestbed
An in-context learning research testbed
☆19Updated 7 months ago
kokolerk / TON
[NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
☆47Updated 2 weeks ago
iOPENCap / awesome-unimodal-training
text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
☆11Updated last year
LightChen233 / M3CoT
☆82Updated last year
yuezih / less-is-more
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆54Updated 11 months ago
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆59Updated 11 months ago
ekonwang / VisuoThink
[Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics]: VisuoThink: Empowering LVLM Reasoning with Mul…
☆30Updated 2 months ago
AlignGPT-VL / AlignGPT
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
☆33Updated last year
zeyofu / ReFocus_Code
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]
☆39Updated 2 months ago
mathllm / Step-Controlled_DPO
☆22Updated last year
junyangwang0410 / Attention-LLaVA
A hot-pluggable tool for visualizing LLaVA's attention.
☆23Updated last year
FudanDISC / ReForm-Eval
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
☆45Updated last year
GaryStack / MMR-V
Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?
☆36Updated 3 months ago
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆59Updated 10 months ago
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆46Updated 11 months ago
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆82Updated 11 months ago
THUNLP-MT / EscapeCraft
Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape. This work is accepted by ICCV 2025.
☆34Updated 3 months ago
edchengg / infoseek_eval
EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions
☆25Updated last year
yhy-2000 / VideoDeepResearch
☆111Updated this week
njucckevin / MM-Self-Improve
A Self-Training Framework for Vision-Language Reasoning
☆86Updated 8 months ago