anthropic-experimental / automated-auditingView external linksLinks
Prompts used in the Automated Auditing Blog Post
☆138Jul 24, 2025Updated 6 months ago
Alternatives and similar repositories for automated-auditing
Users that are interested in automated-auditing are comparing it to the libraries listed below
Sorting:
- Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting☆18Apr 15, 2025Updated 10 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 10 months ago
- A library for training crosscoders☆15May 28, 2025Updated 8 months ago
- Simple tool to identify and remediate the use of the AWS EC2 IMDSv1.☆15Aug 12, 2021Updated 4 years ago
- Information about how the python grammar has changed over time☆12Feb 13, 2024Updated 2 years ago
- Unofficial Experiments with AlgebraNets☆17Jun 17, 2020Updated 5 years ago
- ☆21Jun 22, 2025Updated 7 months ago
- ☆14Aug 29, 2023Updated 2 years ago
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆887Updated this week
- Pytorch implementation on OpenAI's Procgen ppo-baseline, built from scratch.☆14May 17, 2024Updated last year
- Measuring and Controlling Persona Drift in Language Model Dialogs☆21Feb 26, 2024Updated last year
- ☆71Updated this week
- you.com's framework for evaluating deep research systems.☆67May 15, 2025Updated 9 months ago
- ☆35Dec 14, 2025Updated 2 months ago
- ☆82Jan 31, 2026Updated 2 weeks ago
- imperative programming in TensorFlow☆18Dec 12, 2016Updated 9 years ago
- ☆20Apr 10, 2025Updated 10 months ago
- A formalisation of Cartesian Frames, a perspective on embedded agency, in the HOL theorem prover.☆20Dec 20, 2021Updated 4 years ago
- ☆19Jan 21, 2023Updated 3 years ago
- A backport of __future__ annotations to python<3.7.☆22Nov 5, 2021Updated 4 years ago
- Sparse Autoencoder Training Library☆56May 1, 2025Updated 9 months ago
- Understanding RL vision Distill article☆25Mar 3, 2023Updated 2 years ago
- Code repo for the model organisms and convergent directions of EM papers.☆49Sep 22, 2025Updated 4 months ago
- A toolkit that provides a range of model diffing techniques including a UI to visualize them interactively.☆57Updated this week
- Learning latent graph representations of the dancing body with GNNs☆28Dec 8, 2022Updated 3 years ago
- ☆267Oct 1, 2024Updated last year
- ☆30Feb 11, 2022Updated 4 years ago
- Stochastic Parameter Decomposition☆65Updated this week
- This was designed for interp researchers who want to do research on or with interp agents to give quality of life improvements and fix …☆124Feb 8, 2026Updated last week
- Inference API for many LLMs and other useful tools for empirical research☆104Updated this week
- Auditing agents for fine-tuning safety☆18Oct 21, 2025Updated 3 months ago
- Foundations of Python Programming☆40Feb 4, 2025Updated last year
- This is the source code for solving the Traveling Salesman Problems (TSP) using Monte Carlo tree search (MCTS).☆34Sep 25, 2019Updated 6 years ago
- CloudPathSniffer is an open-source, easy to use and extensible Cloud Anomaly Detection platform designed to help security teams to find h…☆13Nov 30, 2023Updated 2 years ago
- Neural Error Mitigation of Near-Term Quantum Simulations (arXiv:2105.08086)☆10Jul 6, 2022Updated 3 years ago
- FARO - Document Sensitivity Detector☆10Sep 30, 2022Updated 3 years ago
- Grouper Python Client Library☆10Apr 18, 2023Updated 2 years ago
- ☆10Apr 26, 2023Updated 2 years ago
- Here, I provided the solution for exercises of IBM Quantum Challenge 2020☆10Oct 27, 2020Updated 5 years ago