Run safety benchmarks against AI models and view detailed reports showing how well they performed.
☆120Feb 18, 2026Updated last week
Alternatives and similar repositories for modelbench
Users that are interested in modelbench are comparing it to the libraries listed below
Sorting:
- Make it easy to automatically and uniformly measure the behavior of many AI Systems.☆26Oct 2, 2024Updated last year
- ☆10Oct 31, 2022Updated 3 years ago
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 8 months ago
- ☆28Feb 11, 2026Updated 2 weeks ago
- ☆11Mar 13, 2023Updated 2 years ago
- A distributed network based on hash codes and lattices.☆14Aug 16, 2016Updated 9 years ago
- Dataset for AAAI paper "Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts"☆11Nov 18, 2022Updated 3 years ago
- This repository contains the results and code for the MLPerf™ Inference v4.0 benchmark.☆10Jul 24, 2025Updated 7 months ago
- ☆155Aug 9, 2022Updated 3 years ago
- The official implementation of InfoRM [NeurIPS 2024].☆15Oct 25, 2025Updated 4 months ago
- This repository contains the results and code for the MLPerf™ Training v3.0 benchmark.☆12Aug 10, 2023Updated 2 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,818Jun 17, 2025Updated 8 months ago
- ☆40Aug 10, 2024Updated last year
- Code for our paper "Localizing Lying in Llama"☆13Apr 24, 2025Updated 10 months ago
- ☆13Jul 20, 2021Updated 4 years ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆128Feb 24, 2025Updated last year
- Interpreting Learned Search and Planning: Reverse-engineering recurrent convolutional networks (DRC) that play Sokoban☆17Jun 29, 2025Updated 8 months ago
- [NeurIPS 2023] Differentially Private Image Classification by Learning Priors from Random Processes☆12Jun 12, 2023Updated 2 years ago
- A Comprehensive Assessment of Trustworthiness in GPT Models☆313Sep 16, 2024Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆13Aug 8, 2025Updated 6 months ago
- [ICML'21 Oral] Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding☆14Jun 10, 2021Updated 4 years ago
- Models for data stocks and training dataset sizes☆18Jul 10, 2024Updated last year
- Applying Reinforcement Learning from Human Feedback to language models to teach them to write short story responses to writing prompts.☆14May 5, 2022Updated 3 years ago
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆21May 2, 2024Updated last year
- Data Banzhaf: A Robust Data Valuation Framework for Machine Learning (AISTATS 2023 Oral)☆18Oct 15, 2023Updated 2 years ago
- Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting☆18Apr 15, 2025Updated 10 months ago
- Evaluation suite for LLMs☆379Jul 11, 2025Updated 7 months ago
- (NeurIPS 2025) Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆47Jun 3, 2025Updated 9 months ago
- ☆21Jun 27, 2024Updated last year
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 5 months ago
- machine learning model performance metrics & charts with confidence intervals, optimized with numba to be fast☆16Dec 15, 2021Updated 4 years ago
- Repository for the Bias Benchmark for QA dataset.☆138Jan 8, 2024Updated 2 years ago
- A collection of utilities for writing labeling functions, transformation functions, and slicing functions.☆22Apr 22, 2020Updated 5 years ago
- ☆20Dec 22, 2023Updated 2 years ago
- 한국어 자연어 처리 모델 미세조정☆17Jan 26, 2021Updated 5 years ago
- ☆25Nov 19, 2025Updated 3 months ago
- ☆21Dec 17, 2020Updated 5 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆47Jan 21, 2025Updated last year