anthropic-experimental / automated-auditingLinks
Prompts used in the Automated Auditing Blog Post
☆123Updated 3 months ago
Alternatives and similar repositories for automated-auditing
Users that are interested in automated-auditing are comparing it to the libraries listed below
Sorting:
- ☆520Updated 5 months ago
- Public repository containing METR's DVC pipeline for eval data analysis☆129Updated 7 months ago
- Super basic implementation (gist-like) of RLMs with REPL environments.☆248Updated last month
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆664Updated this week
- Inference-time scaling for LLMs-as-a-judge.☆310Updated 2 weeks ago
- Official CLI and Python SDK for Prime Intellect - access GPU compute, remote sandboxes, RL environments, and distributed training infrast…☆110Updated this week
- Open Source Replication of Anthropic's Alignment Faking Paper☆51Updated 7 months ago
- A framework for optimizing DSPy programs with RL☆281Updated this week
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆97Updated 4 months ago
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆488Updated last week
- Training-Ready RL Environments + Evals☆174Updated this week
- Open source interpretability artefacts for R1.☆163Updated 7 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆98Updated 7 months ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆477Updated 2 months ago
- Red-Teaming Language Models with DSPy☆235Updated 9 months ago
- ⚖️ Awesome LLM Judges ⚖️☆133Updated 6 months ago
- 🧬 The Huxley-Gödel Machine☆294Updated last week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆120Updated last week
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated last month
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆82Updated 8 months ago
- A Text-Based Environment for Interactive Debugging☆276Updated this week
- Testing baseline LLMs performance across various models☆317Updated last week
- The official code implementation for "Cache-to-Cache: Direct Semantic Communication Between Large Language Models"☆259Updated 2 weeks ago
- Letting Claude Code develop his own MCP tools :)☆123Updated 8 months ago
- ☆186Updated last week
- ☆104Updated 3 months ago
- ☆59Updated 9 months ago
- OSS RL environment + evals toolkit☆200Updated this week
- ☆174Updated 11 months ago
- ☆86Updated 2 weeks ago