wang2226 / Trojan-Activation-Attack

[CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.
20Updated 6 months ago

Alternatives and similar repositories for Trojan-Activation-Attack:

Users that are interested in Trojan-Activation-Attack are comparing it to the libraries listed below