Allen-piexl / JailbreakZoo
View external linksLinks

☆164

Alternatives and similar repositories for JailbreakZoo

Users that are interested in JailbreakZoo are comparing it to the libraries listed below

Sorting:

facebookresearch / multimodal-fusion-jailbreaks
View on GitHub
Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)
☆19Oct 22, 2024Updated last year
usail-hkust / JailTrickBench
View on GitHub
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆163Nov 30, 2024Updated last year
abc03570128 / Jailbreaking-Attack-against-Multimodal-Large-Language-Model
View on GitHub
☆58Aug 11, 2024Updated last year
CryptoAILab / FigStep
View on GitHub
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆191Jun 26, 2025Updated 7 months ago
thu-coai / JailbreakDefense_GoalPriority
View on GitHub
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Jul 9, 2024Updated last year
sail-sg / I-FSJ
View on GitHub
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
☆65Jan 11, 2025Updated last year
erfanshayegani / Jailbreak-In-Pieces
View on GitHub
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…
☆79Jun 6, 2024Updated last year
liudaizong / Awesome-LVLM-Attack
View on GitHub
😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.
☆490Jan 27, 2026Updated 2 weeks ago
AI45Lab / ActorAttack
View on GitHub
☆121Feb 3, 2025Updated last year
NY1024 / Jailbreak_GPT4o
View on GitHub
☆26Jun 5, 2024Updated last year
yueliu1999 / FlipAttack
View on GitHub
[ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".
☆163May 2, 2025Updated 9 months ago
xirui-li / DrAttack
View on GitHub
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆65Aug 25, 2024Updated last year
kztakemoto / simbaja
View on GitHub
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
☆18Apr 24, 2024Updated last year
Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models
View on GitHub
Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
☆266May 13, 2024Updated last year
Peiyance / REVOLVE
View on GitHub
Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization
☆21Dec 13, 2024Updated last year
yjw1029 / Self-Reminder
View on GitHub
Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.
☆56Nov 13, 2023Updated 2 years ago
DAMO-NLP-SG / multilingual-safety-for-LLMs
View on GitHub
[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"
☆97Mar 7, 2024Updated last year
SheltonLiu-N / AutoDAN
View on GitHub
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆427Jan 22, 2025Updated last year
yueliu1999 / Awesome-Jailbreak-on-LLMs
View on GitHub
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, data…
☆1,205Feb 6, 2026Updated last week
patrickrchao / JailbreakingLLMs
View on GitHub
☆696Jul 2, 2025Updated 7 months ago
JailbreakBench / jailbreakbench
View on GitHub
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
☆530Apr 4, 2025Updated 10 months ago
GraySwanAI / nanoGCG
View on GitHub
A fast + lightweight implementation of the GCG algorithm in PyTorch
☆317May 13, 2025Updated 9 months ago
LLM-DRA / DRA
View on GitHub
[USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…
☆112Oct 11, 2024Updated last year
chujiezheng / LLM-Safeguard
View on GitHub
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆106May 20, 2025Updated 8 months ago
uw-nsl / ArtPrompt
View on GitHub
[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
☆93Aug 15, 2025Updated 6 months ago
isXinLiu / MM-SafetyBench
View on GitHub
Accepted by ECCV 2024
☆186Oct 15, 2024Updated last year
theshi-1128 / ReDPJ
View on GitHub
A novel jailbreak attack unveiling an overlooked attack surface inherently in the chain-of-thought reasoning trajectory of LLMs
☆22Sep 18, 2025Updated 4 months ago
kriti-hippo / red_queen
View on GitHub
Red Queen Dataset and data generation template
☆25Dec 26, 2025Updated last month
tml-epfl / llm-adaptive-attacks
View on GitHub
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆377Jan 23, 2025Updated last year
AoiDragon / HADES
View on GitHub
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆38Oct 17, 2024Updated last year
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
View on GitHub
☆57Jun 5, 2024Updated last year
zhliu0106 / learning-to-refuse
View on GitHub
Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"
☆10Dec 13, 2024Updated last year
Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated 7 months ago
NJUNLP / ReNeLLM
View on GitHub
The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…
☆152Sep 2, 2025Updated 5 months ago
CryptoAILab / JailbreakEval
View on GitHub
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆184Apr 1, 2025Updated 10 months ago
isXinLiu / Awesome-MLLM-Safety
View on GitHub
Accepted by IJCAI-24 Survey Track
☆231Aug 25, 2024Updated last year
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
View on GitHub
☆72Mar 30, 2025Updated 10 months ago
Lucas-TY / llm_Implicit_reference
View on GitHub
Official Implementation of implicit reference attack
☆11Oct 16, 2024Updated last year
uw-nsl / SafeDecoding
View on GitHub
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆151Jul 19, 2024Updated last year

Allen-piexl / JailbreakZooView external linksLinks

Alternatives and similar repositories for JailbreakZoo

Allen-piexl / JailbreakZoo
View external linksLinks