safety-research / open-source-alignment-fakingLinks
Open Source Replication of Anthropic's Alignment Faking Paper
☆50Updated 6 months ago
Alternatives and similar repositories for open-source-alignment-faking
Users that are interested in open-source-alignment-faking are comparing it to the libraries listed below
Sorting:
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆99Updated last week
 - Official Repo for InSTA: Towards Internet-Scale Training For Agents☆56Updated 3 months ago
 - Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 6 months ago
 - Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment