redwoodresearch / mlabLinks

Machine Learning for Alignment Bootcamp

☆79

Alternatives and similar repositories for mlab

Users that are interested in mlab are comparing it to the libraries listed below

Sorting:

callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆229Updated 2 months ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆296Updated 10 months ago
METR / task-standard
METR Task Standard
☆163Updated 8 months ago
danielmamay / mlab
Machine Learning for Alignment Bootcamp (MLAB).
☆30Updated 3 years ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆111Updated 4 months ago
UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆115Updated this week
LRudL / evalugator
(Model-written) LLM evals library
☆18Updated 10 months ago
safety-research / safety-tooling
Inference API for many LLMs and other useful tools for empirical research
☆77Updated this week
safety-research / safety-examples
☆19Updated this week
callummcdougall / ARENA_3.0
☆755Updated last month
apartresearch / interpretability-starter
🧠 Starter templates for doing interpretability research
☆74Updated 2 years ago
thestephencasper / everything-you-need
we got you bro
☆36Updated last year
ndif-team / nnsight
The nnsight package enables interpreting and manipulating the internals of deep learned models.
☆685Updated this week
ArthurConmy / Automatic-Circuit-Discovery
☆247Updated last year
neelnanda-io / 1L-Sparse-Autoencoder
☆130Updated 2 years ago
TransluceAI / docent
☆53Updated last month
anthropics / evals
☆305Updated last year
redwoodresearch / remix_public
☆19Updated 2 years ago
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆211Updated this week
METR / public-tasks
☆104Updated last week
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆224Updated 10 months ago
mishajw / repeng
Experiments with representation engineering
☆13Updated last year
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆116Updated last week
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆129Updated 3 years ago
anthropics / PySvelte
A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations
☆198Updated 3 years ago
redwoodresearch / rust_circuit_public
☆65Updated 2 years ago
redwoodresearch / interp
Redwood Research's transformer interpretability tools
☆14Updated 3 years ago
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆53Updated 3 years ago
curt-tigges / probity
☆19Updated 6 months ago
alan-cooney / transformer-from-scratch
Decoder only transformer, built from scratch with PyTorch
☆31Updated 2 years ago