theshi-1128 / jailbreak-bench
The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆19Updated 3 weeks ago
Alternatives and similar repositories for jailbreak-bench:
Users that are interested in jailbreak-bench are comparing it to the libraries listed below
- An easy-to-use Python framework to defend against jailbreak prompts.☆20Updated 3 weeks ago
- ☆80Updated last year
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆33Updated last year
- ☆47Updated 3 months ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆52Updated 7 months ago
- Red Queen Dataset and data generation template☆15Updated 6 months ago
- ☆25Updated 6 months ago
- ☆30Updated 6 months ago
- Repository for Towards Codable Watermarking for Large Language Models☆36Updated last year
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM☆29Updated 3 months ago
- [NDSS 2025] Official code for our paper "Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Wate…☆33Updated 5 months ago
- ☆81Updated 2 months ago
- Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"☆24Updated last year
- ☆19Updated 10 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆130Updated last month
- BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models☆131Updated last month
- An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)☆37Updated last year
- ☆20Updated last year
- ☆54Updated 3 months ago
- ☆44Updated 11 months ago
- ☆20Updated 2 months ago
- Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"☆42Updated 2 years ago
- An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)☆175Updated 2 years ago
- multi-bit language model watermarking (NAACL 24)☆13Updated 6 months ago
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆133Updated 4 months ago
- Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models☆215Updated 11 months ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆22Updated 9 months ago
- This repository provides a benchmark for prompt Injection attacks and defenses☆182Updated this week
- Safety at Scale: A Comprehensive Survey of Large Model Safety☆145Updated last month
- Repo for SemStamp (NAACL2024) and k-SemStamp (ACL2024)☆20Updated 4 months ago