Improbable-AI / curiosity_redteam
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizXgXU)
☆75Updated last year
Alternatives and similar repositories for curiosity_redteam:
Users that are interested in curiosity_redteam are comparing it to the libraries listed below
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆61Updated 3 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆92Updated 11 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆76Updated last month
- A lightweight library for large laguage model (LLM) jailbreaking defense.