wang2226 / Trojan-Activation-Attack

[CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.
23Updated 9 months ago

Alternatives and similar repositories for Trojan-Activation-Attack

Users that are interested in Trojan-Activation-Attack are comparing it to the libraries listed below

Sorting: