METR / task-standardLinks

METR Task Standard

☆169

Alternatives and similar repositories for task-standard

Users that are interested in task-standard are comparing it to the libraries listed below

Sorting:

UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆145Updated 3 weeks ago
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆128Updated last month
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆306Updated last year
METR / public-tasks
☆113Updated last month
UKGovernmentBEIS / inspect_evals
Collection of evals for Inspect AI
☆325Updated this week
METR / RE-Bench
☆127Updated 2 months ago
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆236Updated last year
safety-research / safety-tooling
Inference API for many LLMs and other useful tools for empirical research
☆91Updated 2 weeks ago
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆236Updated 4 months ago
anthropics / evals
☆319Updated last year
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆225Updated 3 weeks ago
neelnanda-io / 1L-Sparse-Autoencoder
☆132Updated 2 years ago
poking-agents / modular-public
☆32Updated 7 months ago
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆123Updated last year
ArthurConmy / Automatic-Circuit-Discovery
☆262Updated last year
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆165Updated 8 months ago
google-deepmind / dangerous-capability-evaluations
☆64Updated 3 weeks ago
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆285Updated last year
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆119Updated last week
redwoodresearch / mlab
Machine Learning for Alignment Bootcamp
☆81Updated 3 years ago
google-deepmind / mishax
☆150Updated 4 months ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆216Updated 2 weeks ago
emergent-misalignment / emergent-misalignment
☆239Updated last month
redwoodresearch / alignment_faking_public
☆82Updated 3 months ago
goodfire-ai / scribe
☆59Updated 3 months ago
callummcdougall / sae_visualizer
☆29Updated last year
TransluceAI / docent
☆75Updated 3 weeks ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆132Updated 3 years ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆234Updated 2 weeks ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆137Updated 10 months ago