ethz-spylab / rlhf-poisoning

Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
52Updated 11 months ago

Alternatives and similar repositories for rlhf-poisoning:

Users that are interested in rlhf-poisoning are comparing it to the libraries listed below