theshi-1128 / jailbreak-bench
The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆19Updated last month
Alternatives and similar repositories for jailbreak-bench:
Users that are interested in jailbreak-bench are comparing it to the libraries listed below
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆33Updated last year
- ☆52Updated 4 months ago
- Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models☆26Updated 4 months ago
- ☆79Updated last year
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆52Updated 8 months ago
- ☆31Updated 7 months ago
- ☆18Updated 10 months ago
- Red Queen Dataset and data generation template☆15Updated 7 months ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM☆31Updated 3 months ago
- ☆26Updated 6 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆136Updated 2 months ago
- An easy-to-use Python framework to defend against jailbreak prompts.☆20Updated last month
- ☆82Updated 3 months ago
- BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks on Large Language Models☆140Updated last week
- ☆17Updated 2 months ago
- ☆44Updated last year
- Repository for Towards Codable Watermarking for Large Language Models☆36Updated last year
- Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models☆219Updated 11 months ago
- Accepted by ECCV 2024☆127Updated 6 months ago
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…☆58Updated 2 weeks ago
- ☆23Updated 2 weeks ago
- Safety at Scale: A Comprehensive Survey of Large Model Safety☆153Updated 2 months ago
- An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)☆38Updated last year
- [CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.☆23Updated 9 months ago
- Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"☆23Updated last year
- [ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`☆68Updated 2 months ago
- LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models☆19Updated last month
- ☆21Updated last year
- TAP: An automated jailbreaking method for black-box LLMs☆165Updated 5 months ago
- MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks…☆21Updated 7 months ago