LLMSecurity / MasterKeyLinks

MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks and evaluating their defenses.

☆26

Alternatives and similar repositories for MasterKey

Users that are interested in MasterKey are comparing it to the libraries listed below

Sorting:

Lyz1213 / BadEdit
☆35Updated 11 months ago
ledllm / ledllm
☆21Updated last year
PKU-ML / PAT
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆19Updated 5 months ago
neelsjain / baseline-defenses
Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"
☆28Updated last year
Gwinhen / BackdoorVault
A toolbox for backdoor attacks.
☆22Updated 2 years ago
Gwinhen / DRUPE
Distribution Preserving Backdoor Attack in Self-supervised Learning
☆17Updated last year
pasquini-dario / LLM_NeuralExec
Code to generate NeuralExecs (prompt injection for LLMs)
☆24Updated last week
TeamPigeonLab / CS-DJ
Accept by CVPR 2025 (highlight)
☆18Updated 4 months ago
thu-coai / JailbreakDefense_GoalPriority
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Updated last year
inspire-group / RobustRAG
☆20Updated last year
wegodev2 / virtual-prompt-injection
Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆23Updated last year
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆91Updated last year
qingjiesjtu / USC
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
☆63Updated 9 months ago
mengtong0110 / InferDPT
☆32Updated 6 months ago
bboylyg / RNP
Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)
☆39Updated last year
verazuo / prompt-stealing-attack
[USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models
☆45Updated 9 months ago
OSU-NLP-Group / AmpleGCG
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆71Updated 11 months ago
reds-lab / ASSET
This repository is the official implementation of the paper "ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning…
☆19Updated 2 years ago
WUSTL-CSPL / LLMJailbreak
☆36Updated last year
lapisrocks / rpo
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆58Updated last year
NY1024 / Foundation-Model-Paper-Notes
☆65Updated 4 months ago
XuanChen-xc / RLbreaker
Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)
☆12Updated 11 months ago
zhangrui4041 / Instruction_Backdoor_Attack
☆27Updated last year
rotaryhammer / code-autodan
An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)
☆43Updated last year
MiracleHH / CBA
Composite Backdoor Attacks Against Large Language Models
☆18Updated last year
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆202Updated 7 months ago
SproutNan / AI-Safety_SCAV
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆44Updated 10 months ago
tmllab / 2025_ICLR_PiF
☆35Updated 4 months ago
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
☆52Updated last year
Trustworthy-AI-Group / Adversarial_Examples_Papers
A list of recent papers about adversarial learning
☆216Updated last week