UKGovernmentBEIS / inspect_evals
Collection of evals for Inspect AI
☆97Updated this week
Alternatives and similar repositories for inspect_evals:
Users that are interested in inspect_evals are comparing it to the libraries listed below
- Improving Alignment and Robustness with Circuit Breakers☆190Updated 6 months ago
- METR Task Standard☆146Updated last month
- ControlArena is a suite of realistic settings, mimicking complex deployment environments, for running control evaluations. This is an alp…☆28Updated this week
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆105Updated 10 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆94Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingface☆90Updated last month
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆91Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆73Updated last year
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆83Updated this week
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆83Updated this week
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆195Updated 5 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆163Updated this week
- ☆53Updated 5 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use