anthropics / evalsLinks

☆304

Alternatives and similar repositories for evals

Users that are interested in evals are comparing it to the libraries listed below

Sorting:

EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆209Updated last week
METR / task-standard
METR Task Standard
☆163Updated 8 months ago
meg-tong / sycophancy-eval
datasets from the paper "Towards Understanding Sycophancy in Language Models"
☆94Updated 2 years ago
aypan17 / machiavelli
☆138Updated 3 months ago
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆247Updated last year
anthropics / ConstitutionalHarmlessnessPaper
☆242Updated 2 years ago
collin-burns / discovering_latent_knowledge
☆279Updated last year
METR / RE-Bench
☆113Updated last week
google-research / cascades
Python library which enables complex compositions of language models such as scratchpads, chain of thought, tool use, selection-inference…
☆215Updated 4 months ago
METR / public-tasks
☆104Updated last week
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆293Updated 10 months ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆238Updated 8 months ago
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆207Updated 11 months ago
ArthurConmy / Automatic-Circuit-Discovery
☆247Updated last year
jessicarumbelow / Backwards
☆84Updated last year
hendrycks / ethics
Aligning AI With Shared Human Values (ICLR 2021)
☆303Updated 2 years ago
UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆104Updated this week
ofirpress / self-ask
Code and data for "Measuring and Narrowing the Compositionality Gap in Language Models"
☆323Updated last year
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆129Updated 3 years ago
normster / llm_rules
RuLES: a benchmark for evaluating rule-following in language models
☆238Updated 8 months ago
UKGovernmentBEIS / inspect_evals
Collection of evals for Inspect AI
☆264Updated this week
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆119Updated last year
lukasberglund / reversal_curse
☆296Updated last year
AlignmentResearch / tuned-lens
Tools for understanding how transformer predictions are built layer-by-layer
☆535Updated 2 months ago
safety-research / safety-tooling
Inference API for many LLMs and other useful tools for empirical research
☆77Updated last week
moirage / alignment-research-dataset
A dataset of alignment research and code to reproduce it
☆78Updated 2 years ago
nrimsky / LM-exp
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆98Updated 2 years ago
rgreenblatt / arc_draw_more_samples_pub
Draw more samples
☆194Updated last year
TomFrederik / unseal
Mechanistic Interpretability for Transformer Models
☆53Updated 3 years ago
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year