Algorithmic-Alignment-Lab / CommonClaim

Explore, Establish, Exploit: Red Teaming Language Models from Scratch
10Updated last year

Related projects: