JailbreakBench/jailbreakbench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JailbreakBench/jailbreakbench)

JailbreakBench / jailbreakbench

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]

☆634

Alternatives and similar repositories for jailbreakbench

Users that are interested in jailbreakbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

centerforaisafety / HarmBench
View on GitHub
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
☆1,013Aug 16, 2024Updated last year
tml-epfl / llm-adaptive-attacks
View on GitHub
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆391Jan 23, 2025Updated last year
EasyJailbreak / EasyJailbreak
View on GitHub
An easy-to-use Python framework to generate adversarial jailbreak prompts.
☆874Mar 30, 2026Updated 3 months ago
JailbreakBench / artifacts
View on GitHub
Jailbreak artifacts for JailbreakBench
☆97Nov 6, 2024Updated last year
SheltonLiu-N / AutoDAN
View on GitHub
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆453Jan 22, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
patrickrchao / JailbreakingLLMs
View on GitHub
☆757Jul 2, 2025Updated last year
RICommunity / TAP
View on GitHub
TAP: An automated jailbreaking method for black-box LLMs
☆241Dec 10, 2024Updated last year
GraySwanAI / nanoGCG
View on GitHub
A fast + lightweight implementation of the GCG algorithm in PyTorch
☆344May 13, 2025Updated last year
usail-hkust / JailTrickBench
View on GitHub
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆167Nov 30, 2024Updated last year
yueliu1999 / Awesome-Jailbreak-on-LLMs
View on GitHub
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, data…
☆1,540Updated this week
llm-attacks / llm-attacks
View on GitHub
Universal and Transferable Attacks on Aligned Language Models
☆4,745Aug 2, 2024Updated last year
weizeming / momentum-attack-llm
View on GitHub
☆25Jan 17, 2025Updated last year
AI45Lab / ActorAttack
View on GitHub
☆135Jun 29, 2026Updated 3 weeks ago
uw-nsl / SafeDecoding
View on GitHub
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆154Jul 19, 2024Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
Yu-Fangxu / COLD-Attack
View on GitHub
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆176Dec 18, 2024Updated last year
GraySwanAI / circuit-breakers
View on GitHub
Improving Alignment and Robustness with Circuit Breakers
☆266Sep 24, 2024Updated last year
Unispac / shallow-vs-deep-alignment
View on GitHub
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆188Apr 23, 2025Updated last year
Princeton-SysML / Jailbreak_LLM
View on GitHub
☆203Nov 26, 2023Updated 2 years ago
CryptoAILab / Awesome-LM-SSP
View on GitHub
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
☆2,021Jun 17, 2026Updated last month
allenai / wildguard
View on GitHub
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆131Dec 2, 2024Updated last year
sherdencooper / GPTFuzz
View on GitHub
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
☆601Feb 27, 2026Updated 4 months ago
chawins / llm-sp
View on GitHub
Papers and resources related to the security and privacy of LLMs 🤖
☆579Jun 8, 2025Updated last year
facebookresearch / advprompter
View on GitHub
Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873
☆183May 6, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
SheltonLiu-N / Universal-Prompt-Injection
View on GitHub
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆73Oct 23, 2024Updated last year
facebookresearch / jailbreak-objectives
View on GitHub
Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"
☆37Jul 2, 2026Updated 3 weeks ago
arobey1 / smooth-llm
View on GitHub
☆135Nov 13, 2023Updated 2 years ago
huizhang-L / CodeChameleon
View on GitHub
☆30Mar 20, 2024Updated 2 years ago
dsbowen / strong_reject
View on GitHub
☆146Jul 7, 2025Updated last year
tmllab / 2025_ICLR_PiF
View on GitHub
☆40May 17, 2025Updated last year
isXinLiu / MM-SafetyBench
View on GitHub
Accepted by ECCV 2024
☆218Oct 15, 2024Updated last year
CryptoAILab / FigStep
View on GitHub
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆211Jun 26, 2025Updated last year
LLM-Tuning-Safety / LLMs-Finetuning-Safety
View on GitHub
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆358Feb 23, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NJUNLP / ReNeLLM
View on GitHub
The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…
☆163Sep 2, 2025Updated 10 months ago
Allen-piexl / JailbreakZoo
View on GitHub
☆171Sep 2, 2024Updated last year
chujiezheng / LLM-Safeguard
View on GitHub
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆108May 20, 2025Updated last year
ydyjya / Awesome-LLM-Safety
View on GitHub
A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide…
☆1,889Jul 12, 2026Updated last week
CHATS-lab / persuasive_jailbreaker
View on GitHub
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
☆363Oct 17, 2025Updated 9 months ago
erfanshayegani / Jailbreak-In-Pieces
View on GitHub
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…
☆93Jun 6, 2024Updated 2 years ago
andyrdt / refusal_direction
View on GitHub
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
☆424Jun 13, 2025Updated last year