The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆21Mar 22, 2025Updated last year
Alternatives and similar repositories for jailbreak-bench
Users that are interested in jailbreak-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An easy-to-use Python framework to defend against jailbreak prompts.☆21Mar 22, 2025Updated last year
- A novel jailbreak attack unveiling an overlooked attack surface inherently in the chain-of-thought reasoning trajectory of LLMs☆22Apr 3, 2026Updated last month
- Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting☆21Mar 25, 2024Updated 2 years ago
- [arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"☆176Feb 20, 2024Updated 2 years ago
- Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"☆18Apr 3, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆34Nov 12, 2024Updated last year
- ☆48May 9, 2024Updated 2 years ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM☆39Jan 17, 2025Updated last year
- Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"☆34Aug 16, 2024Updated last year
- Fine-tuning base models to build robust task-specific models☆35Apr 11, 2024Updated 2 years ago
- This is the official code repository for paper "Quantization Aware Attack: Enhancing Transferable Adversarial Attacks by Model Quantizati…☆14Sep 21, 2025Updated 7 months ago
- Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack CVPR 2021☆15Mar 8, 2024Updated 2 years ago
- 北京邮电大学信通院C++上机题☆14Feb 20, 2021Updated 5 years ago
- ☆27Jun 5, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- BASAR:Black-box Attack on Skeletal Action Recognition, CVPR 2021☆19Feb 18, 2025Updated last year
- ☆16Jul 23, 2024Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆28Jun 11, 2025Updated 10 months ago
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- [NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario☆31Oct 5, 2025Updated 7 months ago
- The purpose of this project is to build a Decentralized IoT STORage space to Store data of diAPP, enOS and lenOS☆10Feb 24, 2023Updated 3 years ago
- Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"☆24Jul 28, 2024Updated last year
- [ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"☆104Mar 7, 2024Updated 2 years ago
- ☆25Jun 16, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆158Sep 2, 2025Updated 8 months ago
- Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechan…☆16Aug 6, 2024Updated last year
- ☆14Jan 23, 2023Updated 3 years ago
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆162Nov 30, 2024Updated last year
- A repo for LLM jailbreak☆14Sep 5, 2023Updated 2 years ago
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆79Jan 23, 2025Updated last year
- ☆13Nov 11, 2022Updated 3 years ago
- ☆24Feb 17, 2026Updated 2 months ago
- A Unified Approach to Interpreting and Boosting Adversarial Transferability (ICLR2021)☆31Apr 22, 2022Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆200Nov 26, 2023Updated 2 years ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆66Aug 25, 2024Updated last year
- [ECCV2024] Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajector…☆31Nov 15, 2025Updated 5 months ago
- ☆18Mar 30, 2025Updated last year
- ICL backdoor attack☆17Nov 4, 2024Updated last year
- ☆39May 21, 2024Updated last year
- 小米便签二次开发☆17Sep 7, 2024Updated last year