jonathan-roberts1 / zerobenchLinks

Code, Data and Red Teaming for ZeroBench

☆50

Alternatives and similar repositories for zerobench

Users that are interested in zerobench are comparing it to the libraries listed below

Sorting:

mbzuai-oryx / Agent-X
Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
☆33Updated this week
facebookresearch / multimodal_rewardbench
Multimodal RewardBench
☆55Updated 9 months ago
mu-cai / matryoshka-mm
Matryoshka Multimodal Models
☆116Updated 10 months ago
yuhui-zh15 / AutoConverter
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…
☆39Updated 6 months ago
si0wang / ViCrit
☆24Updated 5 months ago
wmn-231314 / diffusion-data-constraint
Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…
☆109Updated last month
locuslab / llava-token-compression
☆45Updated last year
mlfoundations / VisIT-Bench
☆50Updated 2 years ago
Wang-ML-Lab / multimodal-needle-in-a-haystack
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
☆50Updated 6 months ago
MAmmoTH-VL / MAmmoTH-VL
(ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
☆48Updated 5 months ago
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆150Updated 2 months ago
YuxiXie / V-DPO
Preference Learning for LLaVA
☆54Updated last year
zeyofu / ReFocus_Code
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]
☆43Updated 4 months ago
shiqichen17 / AdaptVis
Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)
☆61Updated 6 months ago
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆61Updated last year
HanSolo9682 / CounterCurate
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆19Updated last year
orrzohar / Video-STaR
[ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
☆71Updated last year
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆31Updated 6 months ago
alhojel / visual_task_vectors
☆39Updated last year
rohan598 / ConTextual
☆27Updated last year
AtsuMiyai / UPD
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆79Updated 6 months ago
WildVision-AI / LMM-Engines
☆17Updated last year
yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆59Updated last year
amitakamath / whatsup_vlms
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
☆66Updated last year
si0wang / VisVM
☆46Updated 11 months ago
chenllliang / DnD-Transformer
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆79Updated 11 months ago
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆70Updated last year
declare-lab / LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
☆111Updated 9 months ago
UW-Madison-Lee-Lab / CoBSAT
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
☆41Updated 5 months ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆68Updated 7 months ago