UKGovernmentBEIS / control-arenaView external linksLinks
ControlArena is a collection of settings, model organisms and protocols - for running control experiments.
☆153Feb 4, 2026Updated last week
Alternatives and similar repositories for control-arena
Users that are interested in control-arena are comparing it to the libraries listed below
Sorting:
- ☆20May 25, 2024Updated last year
- ☆35May 9, 2025Updated 9 months ago
- Inference API for many LLMs and other useful tools for empirical research☆104Feb 6, 2026Updated last week
- ☆12Jul 12, 2024Updated last year
- ☆21Jun 22, 2025Updated 7 months ago
- ☆25Nov 11, 2025Updated 3 months ago
- Inspect: A framework for large language model evaluations☆1,737Updated this week
- ☆33Jul 9, 2025Updated 7 months ago
- Collection of evals for Inspect AI☆361Updated this week
- ☆17Updated this week
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- Tools for studying developmental interpretability in neural networks.☆126Dec 30, 2025Updated last month
- Code repo for the model organisms and convergent directions of EM papers.☆49Sep 22, 2025Updated 4 months ago
- ☆33Jun 4, 2025Updated 8 months ago
- ☆34Feb 20, 2025Updated 11 months ago
- Experiments with representation engineering☆13Feb 28, 2024Updated last year
- Code for Tangent Model Composition for Ensembling and Continual Fine-tuning (ICCV 2023) and Tangent Transformers for Composition, Privacy…☆13May 14, 2024Updated last year
- Reimplementation of https://github.com/montemac/algebraic_value_editing in pure PyTorch for efficiency on large models☆11Jun 28, 2023Updated 2 years ago
- Sparse Autoencoder Training Library☆56May 1, 2025Updated 9 months ago
- Stochastic Parameter Decomposition☆65Updated this week
- ☆28Sep 23, 2025Updated 4 months ago
- A tiny easily hackable implementation of a feature dashboard.☆15Oct 21, 2025Updated 3 months ago
- ☆20Nov 15, 2024Updated last year
- Mechanistic Interpretability Visualizations using React☆320Dec 18, 2024Updated last year
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆811Updated this week
- Decoder only transformer, built from scratch with PyTorch☆32Oct 22, 2023Updated 2 years ago
- ☆329Jul 2, 2024Updated last year
- Measuring the situational awareness of language models☆40Feb 12, 2024Updated 2 years ago
- Benchmarks for the Evaluation of LLM Supervision☆33Jan 19, 2026Updated 3 weeks ago
- Experimental LLM interface exploring new ways to use AI to improve human thinking☆20Updated this week
- Repository with sample code using Apollo's suggested engineering practices☆15Dec 16, 2024Updated last year
- A Kubernetes sandbox environment for use with inspect_ai☆26Feb 6, 2026Updated last week
- Training Sparse Autoencoders on Language Models☆1,201Updated this week
- ☆267Oct 1, 2024Updated last year
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated last year
- Playing around with various jailbreaking techniques ahead of the Gray Swan AI Ultimate Jailbreaking Competition☆18Oct 6, 2024Updated last year
- ☆18Feb 4, 2025Updated last year
- Automatically create Anki cards from text using language models☆20Jan 7, 2023Updated 3 years ago
- ☆83Oct 8, 2025Updated 4 months ago