jonathan-roberts1 / zerobenchLinks
Code, Data and Red Teaming for ZeroBench
☆53Updated 3 weeks ago
Alternatives and similar repositories for zerobench
Users that are interested in zerobench are comparing it to the libraries listed below
Sorting:
- (ACL 2025) MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale☆49Updated 7 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆40Updated 7 months ago
- Multimodal RewardBench☆58Updated 10 months ago
- ☆24Updated 7 months ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Updated last year
- Matryoshka Multimodal Models☆121Updated 11 months ago
- ☆27Updated last year
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆68Updated 8 months ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆155Updated 3 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Updated last year
- ☆46Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆77Updated last year
- ☆80Updated 6 months ago
- ☆50Updated 2 years ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆176Updated 7 months ago
- ☆46Updated last year
- Preference Learning for LLaVA☆58Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]☆77Updated 6 months ago
- ☆17Updated last year
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆34Updated last month
- This repository is maintained to release dataset and models for multimodal puzzle reasoning.☆113Updated 10 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆68Updated last year
- ☆109Updated last year
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆62Updated last year
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆118Updated last week
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆84Updated 2 months ago
- ☆105Updated 7 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆69Updated last year
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆42Updated 7 months ago
- Holistic evaluation of multimodal foundation models