Code, Data and Red Teaming for ZeroBench
☆54Dec 23, 2025Updated 2 months ago
Alternatives and similar repositories for zerobench
Users that are interested in zerobench are comparing it to the libraries listed below
Sorting:
- NeurIPS 2024: SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation☆13May 24, 2025Updated 9 months ago
- A collection of papers tackling automatic fact-checking (particularly of AI-generated content)☆14Nov 3, 2023Updated 2 years ago
- Accompanying repo for CVPRW'24: Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs☆27May 24, 2025Updated 9 months ago
- Accompanying repo for NeurIPSW'23: GPT4GEO: How a Language Model Sees the World's Geography☆27May 24, 2025Updated 9 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆62Dec 10, 2024Updated last year
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Jan 29, 2025Updated last year
- A Framework for Evaluating AI Agent Safety in Realistic Environments☆30Oct 2, 2025Updated 5 months ago
- Official code for the paper "Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?" (ICLR 2024)☆10Aug 26, 2024Updated last year
- ☆11Oct 20, 2023Updated 2 years ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- [ICML2023] Instant Soup Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models. Ajay Jaiswal, Shiwei Liu, Ti…☆11Nov 28, 2023Updated 2 years ago
- This is an implementation of the paper "Are We Done with Object-Centric Learning?"☆12Sep 11, 2025Updated 5 months ago
- ☆13Jan 22, 2025Updated last year
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Feb 9, 2026Updated 3 weeks ago
- ☆19Jul 31, 2025Updated 7 months ago
- ☆12Dec 4, 2024Updated last year
- ☆13May 12, 2025Updated 9 months ago
- ☆46Dec 30, 2024Updated last year
- [ICML 2023] "Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?" by Ruisi Cai, Zhenyu Zhang, Zhangyang Wang☆16May 4, 2023Updated 2 years ago
- ☆13Jul 19, 2022Updated 3 years ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Jul 17, 2024Updated last year
- ☆21Jul 25, 2025Updated 7 months ago
- [ECCV2024]FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance☆17Sep 11, 2024Updated last year
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- [ICCV 2025] Official implementation of "What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?"☆18Aug 7, 2025Updated 6 months ago
- An automated data pipeline scaling RL to pretraining levels☆73Oct 11, 2025Updated 4 months ago
- [NeurIPS 2025] Reasoning Models Better Express Their Confidence"☆22Nov 19, 2025Updated 3 months ago
- ☆33Jul 9, 2025Updated 7 months ago
- ☆41Jan 4, 2026Updated 2 months ago
- ☆24May 13, 2025Updated 9 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆128Jul 24, 2025Updated 7 months ago
- ☆49Apr 4, 2025Updated 11 months ago
- [SCIS] MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images☆44Nov 19, 2025Updated 3 months ago
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…☆20Jan 11, 2026Updated last month
- [ICLR 2026] Official repository of "InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models".☆91Feb 6, 2026Updated 3 weeks ago
- Official code for the paper: "Metadata Archaeology"☆19May 10, 2023Updated 2 years ago
- ☆21Oct 10, 2023Updated 2 years ago
- Implementation of MixCE method described in ACL 2023 paper by Zhang et al.☆20May 29, 2023Updated 2 years ago