controllability / jailbreak-evaluationLinks
The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.
☆26Updated last year
Alternatives and similar repositories for jailbreak-evaluation
Users that are interested in jailbreak-evaluation are comparing it to the libraries listed below
Sorting:
- LLM security and privacy☆53Updated last year
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆92Updated last year
- A collection of prompt injection mitigation techniques.☆26Updated 2 years ago
- A prompt injection game to collect data for robust ML research☆65Updated 11 months ago
- General research for Dreadnode☆27Updated last year
- Papers about red teaming LLMs and Multimodal models.☆159Updated 7 months ago
- ☆22Updated 2 years ago
- ☆112Updated last month
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Updated last year
- This repository provides a benchmark for prompt injection attacks and defenses in LLMs☆373Updated 2 months ago
- Risks and targets for assessing LLMs & LLM vulnerabilities☆33Updated last year
- ☆66Updated 3 months ago
- Automated Safety Testing of Large Language Models☆17Updated 11 months ago
- ☆109Updated 5 months ago
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆179Updated 9 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆56Updated last year
- A repository of Language Model Vulnerabilities and Exposures (LVEs).☆112Updated last year
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆96Updated last year
- A prompt defence is a multi-layer defence that can be used to protect your applications against prompt injection attacks.☆21Updated 3 weeks ago
- ☆29Updated 7 months ago
- ☆120Updated 6 months ago
- 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exploiting large language model AI solutions.☆28Updated last year
- ☆34Updated last year
- A benchmark for prompt injection detection systems.☆152Updated 3 weeks ago
- The fastest Trust Layer for AI Agents☆146Updated 7 months ago
- LLM | Security | Operations in one github repo with good links and pictures.☆86Updated last week
- [ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability☆176Updated last year
- Tree of Attacks (TAP) Jailbreaking Implementation☆117Updated last year
- ☆48Updated last year
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…☆447Updated last year