wang2226 / Trojan-Activation-Attack

[CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.
21Updated 7 months ago

Alternatives and similar repositories for Trojan-Activation-Attack:

Users that are interested in Trojan-Activation-Attack are comparing it to the libraries listed below