wang2226 / Trojan-Activation-Attack
View external linksLinks

[CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.
β˜†29Jul 29, 2024Updated last year

Alternatives and similar repositories for Trojan-Activation-Attack

Users that are interested in Trojan-Activation-Attack are comparing it to the libraries listed below

Sorting:

Are these results useful?