visual-haystacks / vhs_benchmarkLinks

🔥 [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"

☆30

Alternatives and similar repositories for vhs_benchmark

Users that are interested in vhs_benchmark are comparing it to the libraries listed below

Sorting:

yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆70Updated last year
visual-haystacks / mirage
🔥 [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
☆20Updated 8 months ago
pipilurj / bootstrapped-preference-optimization-BPO
code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
☆59Updated last year
YiyangZhou / POVID
[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
☆88Updated last year
shiqichen17 / VLM_Merging
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
☆75Updated 2 weeks ago
luka-group / mDPO
[EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.
☆81Updated 11 months ago
vlf-silkie / VLFeedback
☆100Updated last year
bronyayang / Law_of_Vision_Representation_in_MLLMs
Official implementation of the Law of Vision Representation in MLLMs
☆166Updated 10 months ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆141Updated 2 weeks ago
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆80Updated last year
clemneo / llava-interp
☆69Updated 11 months ago
yfzhang114 / LLaVA-Align
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆82Updated 7 months ago
yuhui-zh15 / AutoConverter
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆37Updated 4 months ago
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆46Updated 11 months ago
longvideobench / LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
☆108Updated last year
mshukor / ima-lmms
[NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
☆21Updated 11 months ago
vlm2-bench / VLM2-Bench
VLM2-Bench [ACL 2025 Main]: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
☆42Updated 4 months ago
mu-cai / TemporalBench
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
☆37Updated 11 months ago
MAmmoTH-VL / MAmmoTH-VL
(ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
☆48Updated 4 months ago
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆155Updated 4 months ago
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆91Updated 11 months ago
RAIVNLab / sugar-crepe
[NeurIPS 2023] A faithful benchmark for vision-language compositionality
☆86Updated last year
yuhui-zh15 / C3
Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)
☆33Updated 11 months ago
junyangwang0410 / AMBER
An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation
☆132Updated last year
RifleZhang / LLaVA-Hound-DPO
☆153Updated 11 months ago
nickjiang2378 / vlm-hallucinations
Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)
☆86Updated 4 months ago
NUS-TRAIL / NoisyRollout
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆92Updated 3 weeks ago
thunlp / DeepPerception
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆66Updated 4 months ago
IntelLabs / lvlm-interpret
☆102Updated 6 months ago
AtsuMiyai / UPD
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆78Updated 4 months ago