arobey1 / advbench
☆41Updated 2 years ago
Alternatives and similar repositories for advbench:
Users that are interested in advbench are comparing it to the libraries listed below
- ☆41Updated last month
- ☆30Updated 2 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆33Updated 4 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆57Updated 2 months ago
- ☆53Updated 2 years ago
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆60Updated last year
- ☆30Updated 5 months ago
- Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting☆14Updated 7 months ago
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Updated 10 months ago
- ☆17Updated 5 months ago
- ☆31Updated 5 months ago
- ☆31Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]