goodfire-ai / goodfire-sdkLinks

Ember is a hosted API/SDK that lets you shape AI model behavior by directly controlling a model's internal units of computation, or "features". With Ember, you can modify features to precisely control model outputs, or use them as building blocks for tasks like classification.

☆37

Alternatives and similar repositories for goodfire-sdk

Users that are interested in goodfire-sdk are comparing it to the libraries listed below

Sorting:

METR / task-standard
METR Task Standard
☆167Updated 9 months ago
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆120Updated last week
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆301Updated 11 months ago
goodfire-ai / scribe
☆53Updated last month
METR / public-tasks
☆106Updated last week
jbloomAus / SAEDashboard
☆79Updated last month
amack315 / unsupervised-steering-vectors
☆36Updated last year
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
redwoodresearch / mlab
Machine Learning for Alignment Bootcamp
☆81Updated 3 years ago
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆214Updated last year
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆163Updated 7 months ago
UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆128Updated this week
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆227Updated 11 months ago
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆122Updated last year
safety-research / safety-tooling
Inference API for many LLMs and other useful tools for empirical research
☆80Updated this week
callummcdougall / ARENA_2.0
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
☆232Updated 3 months ago
hijohnnylin / neuronpedia
open source interpretability platform 🧠
☆486Updated this week
LRudL / evalugator
(Model-written) LLM evals library
☆18Updated 11 months ago
emergent-misalignment / emergent-misalignment
☆226Updated 3 weeks ago
JasonGross / guarantees-based-mechanistic-interpretability
☆17Updated last week
ArthurConmy / Automatic-Circuit-Discovery
☆253Updated last year
redwoodresearch / alignment_faking_public
☆81Updated last month
safety-research / safety-examples
☆19Updated last week
anthropics / evals
☆310Updated last year
anthropics / sycophancy-to-subterfuge-paper
☆25Updated last year
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆212Updated last week
UKGovernmentBEIS / inspect_evals
Collection of evals for Inspect AI
☆284Updated this week
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆284Updated last year
poking-agents / modular-public
☆32Updated 5 months ago
alan-cooney / transformer-from-scratch
Decoder only transformer, built from scratch with PyTorch
☆31Updated 2 years ago