theshi-1128 / jailbreak-bench
The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆16Updated 4 months ago
Alternatives and similar repositories for jailbreak-bench:
Users that are interested in jailbreak-bench are comparing it to the libraries listed below
- Red Queen Dataset and data generation template☆12Updated 5 months ago
- An easy-to-use Python framework to defend against jailbreak prompts.☆19Updated 6 months ago
- LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models☆17Updated last week
- ☆78Updated 11 months ago
- ☆26Updated 5 months ago
- ☆42Updated 2 months ago
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆30Updated last year
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆121Updated 2 weeks ago
- Code Implementation For Paper "FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition"☆10Updated 4 months ago
- ☆51Updated 2 months ago
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…☆75Updated 5 months ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆47Updated 6 months ago
- Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models☆23Updated 2 months ago
- ☆24Updated 4 months ago
- ☆77Updated last month
- Code to generate NeuralExecs (prompt injection for LLMs)☆20Updated 3 months ago
- BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models☆110Updated 3 weeks ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM☆29Updated last month
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …☆17Updated 4 months ago
- ☆18Updated 8 months ago
- An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)☆35Updated last year
- TAP: An automated jailbreaking method for black-box LLMs☆150Updated 3 months ago
- MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks…☆18Updated 6 months ago
- [ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models☆28Updated 7 months ago
- ☆14Updated last week
- [USENIX Security '24] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities agai…☆41Updated 4 months ago
- [NDSS'25 Poster] A collection of automated evaluators for assessing jailbreak attempts.☆120Updated this week