UKGovernmentBEIS / inspect_evalsLinks

Collection of evals for Inspect AI

☆264

Alternatives and similar repositories for inspect_evals

Users that are interested in inspect_evals are comparing it to the libraries listed below

Sorting:

METR / task-standard
METR Task Standard
☆163Updated 8 months ago
UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆115Updated this week
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆116Updated this week
METR / RE-Bench
☆113Updated last week
princeton-pli / hal-harness
☆172Updated this week
GraySwanAI / circuit-breakers
Improving Alignment and Robustness with Circuit Breakers
☆238Updated last year
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆163Updated 6 months ago
emergent-misalignment / emergent-misalignment
☆222Updated 7 months ago
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆209Updated 11 months ago
andyrdt / refusal_direction
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
☆292Updated 4 months ago
haizelabs / verdict
Inference-time scaling for LLMs-as-a-judge.
☆304Updated 3 weeks ago
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆120Updated last year
haizelabs / dspy-redteam
Red-Teaming Language Models with DSPy
☆235Updated 8 months ago
safety-research / safety-tooling
Inference API for many LLMs and other useful tools for empirical research
☆77Updated this week
anthropics / evals
☆304Updated last year
mlcommons / modelbench
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
☆107Updated this week
google-deepmind / dangerous-capability-evaluations
☆60Updated last month
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆126Updated 8 months ago
meg-tong / sycophancy-eval
datasets from the paper "Towards Understanding Sycophancy in Language Models"
☆95Updated 2 years ago
allenai / olmes
Reproducible, flexible LLM evaluations
☆257Updated last week
mlfoundations / evalchemy
Automatic evals for LLMs
☆550Updated 4 months ago
ServiceNow / TapeAgents
TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle
☆298Updated last week
centerforaisafety / wmdp
WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…
☆146Updated 4 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆300Updated 3 weeks ago
safety-research / persona_vectors
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
☆269Updated 2 months ago
METR / public-tasks
☆104Updated last week
haizelabs / redteaming-resistance-benchmark
☆49Updated last year
ryoungj / ToolEmu
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
☆167Updated last year
google-deepmind / mishax
☆142Updated last month
AI-secure / DecodingTrust
A Comprehensive Assessment of Trustworthiness in GPT Models
☆306Updated last year