METR / public-tasksLinks

☆108

Alternatives and similar repositories for public-tasks

Users that are interested in public-tasks are comparing it to the libraries listed below

Sorting:

METR / task-standard
METR Task Standard
☆168Updated 10 months ago
METR / RE-Bench
☆119Updated last month
google-deepmind / mishax
☆144Updated 3 months ago
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆120Updated 3 weeks ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆164Updated 7 months ago
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆122Updated last year
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆214Updated this week
anthropics / evals
☆315Updated last year
emergent-misalignment / emergent-misalignment
☆229Updated this week
PrimeIntellect-ai / prime-environments
Training-Ready RL Environments + Evals
☆185Updated this week
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆248Updated last year
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆216Updated last year
google-deepmind / dangerous-capability-evaluations
☆62Updated 2 months ago
callummcdougall / sae_vis
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
☆231Updated 11 months ago
UKGovernmentBEIS / control-arena
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆132Updated this week
goodfire-ai / scribe
☆54Updated 2 months ago
rgreenblatt / arc_draw_more_samples_pub
Draw more samples
☆195Updated last year
aypan17 / machiavelli
☆143Updated 4 months ago
anthropics / toy-models-of-superposition
Notebooks accompanying Anthropic's "Toy Models of Superposition" paper
☆130Updated 3 years ago
jerber / lang-jepa
☆129Updated 11 months ago
TransformerLensOrg / CircuitsVis
Mechanistic Interpretability Visualizations using React
☆302Updated 11 months ago
timaeus-research / devinterp
Tools for studying developmental interpretability in neural networks.
☆117Updated 5 months ago
callummcdougall / sae_visualizer
☆29Updated last year
LeonGuertler / UnstableBaselines
☆107Updated this week
haizelabs / verdict
Inference-time scaling for LLMs-as-a-judge.
☆314Updated last month
JD-P / minihf
MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…
☆182Updated last month
poking-agents / modular-public
☆32Updated 6 months ago
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆55Updated 7 months ago
EleutherAI / concept-erasure
Erasing concepts from neural representations with provable guarantees
☆239Updated 10 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆319Updated last month