theshi-1128 / jailbreak-bench
The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆13Updated 2 months ago
Alternatives and similar repositories for jailbreak-bench:
Users that are interested in jailbreak-bench are comparing it to the libraries listed below
- An easy-to-use Python framework to defend against jailbreak prompts.☆19Updated 4 months ago
- ☆33Updated last month
- Red Queen Dataset and data generation template☆10Updated 3 months ago
- ☆77Updated 9 months ago
- ☆23Updated 4 months ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM☆28Updated 2 weeks ago
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…☆62Updated 3 months ago
- A toolbox for backdoor attacks.☆20Updated 2 years ago
- Composite Backdoor Attacks Against Large Language Models☆11Updated 9 months ago
- Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models☆20Updated last month
- Official code for our NDSS paper "Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarkin…☆25Updated 2 months ago
- An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)☆32Updated 11 months ago
- ☆21Updated 3 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆104Updated last month
- Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models☆198Updated 8 months ago
- ☆12Updated last year
- 😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.☆193Updated 3 weeks ago
- MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks…☆18Updated 4 months ago
- ☆16Updated 7 months ago
- ☆70Updated last week
- Figure it out: Analyzing-based Jailbreak Attack on Large Language Models☆17Updated 2 months ago
- Distribution Preserving Backdoor Attack in Self-supervised Learning☆14Updated last year
- ☆49Updated last month
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples☆23Updated last year
- Accepted by ECCV 2024☆92Updated 3 months ago
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆29Updated last year
- Benchmarking Physical Risk Awareness of Foundation Model-based Embodied AI Agents☆13Updated 2 months ago
- This repository is the official implementation of the paper "ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning…☆17Updated last year
- ☆14Updated last year
- Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"☆41Updated 2 years ago