anthropic-experimental / agentic-misalignmentLinks
☆533Updated 5 months ago
Alternatives and similar repositories for agentic-misalignment
Users that are interested in agentic-misalignment are comparing it to the libraries listed below
Sorting:
- Prompts used in the Automated Auditing Blog Post☆125Updated 4 months ago
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆696Updated 2 weeks ago
- ☆230Updated last week
- open source interpretability platform 🧠☆515Updated last week
- Collection of evals for Inspect AI☆297Updated this week
- Testing baseline LLMs performance across various models