andyrdt / refusal_direction
View external linksLinks

Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
340Jun 13, 2025Updated 8 months ago

Alternatives and similar repositories for refusal_direction

Users that are interested in refusal_direction are comparing it to the libraries listed below

Sorting:

Are these results useful?