UKGovernmentBEIS / inspect_k8s_sandboxLinks
A Kubernetes sandbox environment for use with inspect_ai
☆15Updated 2 weeks ago
Alternatives and similar repositories for inspect_k8s_sandbox
Users that are interested in inspect_k8s_sandbox are comparing it to the libraries listed below
Sorting:
- ☆22Updated 3 weeks ago
- A file utility for accessing both local and remote files through a unified interface.☆42Updated last month
- A simple python wrapper for using the Caddy API☆19Updated last month
- ☆21Updated this week
- ☆55Updated 9 months ago
- METR Task Standard☆151Updated 4 months ago
- Collection of evals for Inspect AI☆167Updated this week
- 🦠 DeepDecipher: An open source API to MLP neurons☆9Updated last year
- ControlArena is a suite of realistic settings, mimicking complex deployment environments, for running control evaluations. This is an alp…☆69Updated this week
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 9 months ago
- An attribution library for LLMs☆41Updated 9 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 2 months ago
- Benchmark structured generation libraries☆28Updated 8 months ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆14Updated 6 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆27Updated last year
- ☆13Updated last week
- Estimate costs of complex LLM workflows in advance before spending money☆10Updated last month
- ☆41Updated 5 months ago
- Simple repository for training small reasoning models☆33Updated 4 months ago
- A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.☆17Updated 4 months ago
- ☆18Updated 2 months ago
- ☆134Updated 2 months ago
- ☆35Updated 2 years ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆96Updated this week
- Landing page + leaderboard for SWE-Bench benchmark☆6Updated 2 weeks ago
- Efficiently computing & storing token n-grams from large corpora☆24Updated 8 months ago
- CMU Linguistic Annotation Backend☆15Updated last year
- Minimum Description Length probing for neural network representations☆18Updated 5 months ago
- ☆66Updated last month
- The official evaluation suite and dynamic data release for MixEval.☆11Updated 9 months ago