The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆22Mar 22, 2025Updated last year
Alternatives and similar repositories for jailbreak-bench
Users that are interested in jailbreak-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An easy-to-use Python framework to defend against jailbreak prompts.☆21Mar 22, 2025Updated last year
- Red Queen Dataset and data generation template☆27Dec 26, 2025Updated 3 months ago
- Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting☆20Mar 25, 2024Updated 2 years ago
- [arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"☆174Feb 20, 2024Updated 2 years ago
- Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"☆18Feb 17, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆34Nov 12, 2024Updated last year
- ☆47May 9, 2024Updated last year
- PyTorch Implementation of the paper "Defining and Quantifying the Emergence of Sparse Concepts in DNNs" (CVPR 2023)☆12Dec 24, 2023Updated 2 years ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM☆39Jan 17, 2025Updated last year
- Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"☆34Aug 16, 2024Updated last year
- Fine-tuning base models to build robust task-specific models☆34Apr 11, 2024Updated last year
- ☆12Sep 29, 2024Updated last year
- Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack CVPR 2021☆14Mar 8, 2024Updated 2 years ago
- ☆12Apr 25, 2025Updated 11 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆14Mar 9, 2025Updated last year
- ☆27Jun 5, 2024Updated last year
- BASAR:Black-box Attack on Skeletal Action Recognition, CVPR 2021☆20Feb 18, 2025Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆27Jun 11, 2025Updated 9 months ago
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆20Aug 10, 2024Updated last year
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- [NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario☆30Oct 5, 2025Updated 5 months ago
- Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"☆23Jul 28, 2024Updated last year
- The purpose of this project is to build a Decentralized IoT STORage space to Store data of diAPP, enOS and lenOS☆11Feb 24, 2023Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"☆101Mar 7, 2024Updated 2 years ago
- The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆155Sep 2, 2025Updated 6 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆113Dec 2, 2024Updated last year
- Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechan…☆16Aug 6, 2024Updated last year
- ☆14Jan 23, 2023Updated 3 years ago
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆160Nov 30, 2024Updated last year
- A repo for LLM jailbreak☆14Sep 5, 2023Updated 2 years ago
- An evolutionary, coverage-guided greybox network protocol fuzzer☆21Aug 31, 2021Updated 4 years ago
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆79Jan 23, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A collection of client-side libraries with HTML injection vulnerabilities and DOM clobbering gadgets.☆48Aug 31, 2025Updated 6 months ago
- ☆24Feb 17, 2026Updated last month
- ☆196Nov 26, 2023Updated 2 years ago
- A curated collection of research and techniques for protecting intellectual property of large language models, including watermarking, fi…☆47Feb 15, 2026Updated last month
- [ECCV2024] Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajector…☆31Nov 15, 2025Updated 4 months ago
- ICL backdoor attack☆17Nov 4, 2024Updated last year
- ☆18Mar 30, 2025Updated last year