boyiwei / alignment-attribution-codeView on GitHub
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
89Mar 30, 2025Updated 11 months ago

Alternatives and similar repositories for alignment-attribution-code

Users that are interested in alignment-attribution-code are comparing it to the libraries listed below

Sorting:

Are these results useful?