xyq7 / GradSafeView on GitHub
Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"
65Oct 27, 2024Updated last year

Alternatives and similar repositories for GradSafe

Users that are interested in GradSafe are comparing it to the libraries listed below

Sorting:

Are these results useful?