neelsjain / baseline-defenses

Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"
17Updated 10 months ago

Related projects: