controllability / jailbreak-evaluation
The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.
☆19Updated 2 months ago
Alternatives and similar repositories for jailbreak-evaluation:
Users that are interested in jailbreak-evaluation are comparing it to the libraries listed below
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆39Updated 2 months ago
- LLM security and privacy☆43Updated 3 months ago
- This repository provides implementation to formalize and benchmark Prompt Injection attacks and defenses☆163Updated this week
- A collection of automated evaluators for assessing jailbreak attempts.☆92Updated last week
- ☆42Updated 8 months ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆41Updated 4 months ago
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆53Updated 9 months ago
- ☆47Updated 6 months ago
- ☆18Updated last year
- The opensoure repository of FuzzLLM☆18Updated 8 months ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆79Updated 8 months ago
- Fine-tuning base models to build robust task-specific models☆24Updated 9 months ago
- ☆63Updated 3 months ago
- ☆76Updated last year
- ☆28Updated last month
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆48Updated 5 months ago
- A prompt injection game to collect data for robust ML research☆49Updated 3 weeks ago
- ☆45Updated last month
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆89Updated 3 weeks ago
- This project investigates the security of large language models by performing binary classification of a set of input prompts to discover…☆36Updated last year
- [ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability☆128Updated last month
- Papers about red teaming LLMs and Multimodal models.☆91Updated last month
- Risks and targets for assessing LLMs & LLM vulnerabilities☆30Updated 7 months ago
- future-proof vulnerability detection benchmark, based on CVEs in open-source repos☆46Updated last week
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆36Updated 3 months ago
- General research for Dreadnode☆19Updated 7 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆66Updated 10 months ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆74Updated this week
- 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exploiting large language model AI solutions.☆18Updated 8 months ago
- ☆26Updated 2 months ago