xyq7 / GradSafeView on GitHub
Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"
66Oct 27, 2024Updated last year

Alternatives and similar repositories for GradSafe

Users that are interested in GradSafe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?