xyq7 / GradSafe

Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"
40Updated 2 weeks ago

Related projects

Alternatives and complementary repositories for GradSafe