The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆21Mar 22, 2025Updated last year
Alternatives and similar repositories for jailbreak-bench
Users that are interested in jailbreak-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Red Queen Dataset and data generation template☆26Dec 26, 2025Updated 3 months ago
- A novel jailbreak attack unveiling an overlooked attack surface inherently in the chain-of-thought reasoning trajectory of LLMs☆22Apr 3, 2026Updated 2 weeks ago
- [arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"☆176Feb 20, 2024Updated 2 years ago
- Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"☆18Apr 3, 2026Updated 2 weeks ago
- ☆48May 9, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- PyTorch Implementation of the paper "Defining and Quantifying the Emergence of Sparse Concepts in DNNs" (CVPR 2023)☆12Dec 24, 2023Updated 2 years ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM☆39Jan 17, 2025Updated last year
- SVIP: Towards Verifiable Inference of Open-Source Large Language Models☆15Jun 3, 2025Updated 10 months ago
- Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"☆34Aug 16, 2024Updated last year
- ☆12Sep 29, 2024Updated last year
- [CVPR 2024] "Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training", Di Ming, Peng Ren, Yunlong Wang, Xin …☆16Jun 11, 2024Updated last year
- ☆12Apr 25, 2025Updated 11 months ago
- BASAR:Black-box Attack on Skeletal Action Recognition, CVPR 2021☆19Feb 18, 2025Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆28Jun 11, 2025Updated 10 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'☆24Jun 9, 2024Updated last year
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆21Aug 10, 2024Updated last year
- 批量挖掘漏洞☆19May 23, 2021Updated 4 years ago
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- [NeurIPS 2025 D&B (Spotlight🌟)] TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario☆30Oct 5, 2025Updated 6 months ago
- AdaICL: Which Examples to Annotate of In-Context Learning? Towards Effective and Efficient Selection☆19Oct 30, 2023Updated 2 years ago
- Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"☆23Jul 28, 2024Updated last year
- ☆25Jun 16, 2024Updated last year
- The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…☆158Sep 2, 2025Updated 7 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆116Dec 2, 2024Updated last year
- Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechan…☆16Aug 6, 2024Updated last year
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆160Nov 30, 2024Updated last year
- A repo for LLM jailbreak☆14Sep 5, 2023Updated 2 years ago
- An evolutionary, coverage-guided greybox network protocol fuzzer☆21Aug 31, 2021Updated 4 years ago
- ☆13Nov 11, 2022Updated 3 years ago
- A collection of client-side libraries with HTML injection vulnerabilities and DOM clobbering gadgets.☆49Aug 31, 2025Updated 7 months ago
- [NeurIPS 2023] Differentially Private Image Classification by Learning Priors from Random Processes☆12Jun 12, 2023Updated 2 years ago
- ☆24Feb 17, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A Unified Approach to Interpreting and Boosting Adversarial Transferability (ICLR2021)☆31Apr 22, 2022Updated 3 years ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆66Aug 25, 2024Updated last year
- A curated collection of research and techniques for protecting intellectual property of large language models, including watermarking, fi…☆47Feb 15, 2026Updated 2 months ago
- [ECCV2024] Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajector…☆31Nov 15, 2025Updated 5 months ago
- ICL backdoor attack☆17Nov 4, 2024Updated last year
- OCR识别内容后直 接请求GPT获取结果的便捷工具。☆10May 5, 2023Updated 2 years ago
- A curated list of browser fuzzing researches, papers, tools, ...☆14Jan 30, 2023Updated 3 years ago