ethz-spylab / rlhf-poisoning

Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
46Updated 9 months ago

Alternatives and similar repositories for rlhf-poisoning:

Users that are interested in rlhf-poisoning are comparing it to the libraries listed below