ZHZisZZ / emulated-disalignmentView on GitHub
[ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
39Aug 2, 2024Updated last year

Alternatives and similar repositories for emulated-disalignment

Users that are interested in emulated-disalignment are comparing it to the libraries listed below

Sorting:

Are these results useful?