LLMSecurity / MasterKeyLinks
MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks and evaluating their defenses.
☆27Updated last year
Alternatives and similar repositories for MasterKey
Users that are interested in MasterKey are comparing it to the libraries listed below
Sorting:
- Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"☆28Updated 2 years ago
- ☆36Updated last year
- A toolbox for backdoor attacks.☆22Updated 2 years ago
- Distribution Preserving Backdoor Attack in Self-supervised Learning☆20Updated last year
- This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.☆63Updated 11 months ago
- ☆21Updated last year
- ☆37Updated last year
- ☆69Updated 6 months ago
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"☆21Updated 6 months ago
- ☆111Updated 9 months ago
- Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)☆39Updated last year
- ☆23Updated last year
- This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"☆46Updated last month
- An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)☆44Updated last year
- [NDSS 2025] Official code for our paper "Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Wate…☆45Updated last year
- ☆20Updated last year
- ☆18Updated 3 years ago
- ☆49Updated last year
- Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"☆59Updated last year
- Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"☆58Updated 2 years ago
- This is an official repository for Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study (ICCV2023…☆24Updated 2 years ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Updated last year
- Code to generate NeuralExecs (prompt injection for LLMs)☆25Updated last month
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆178Updated 4 months ago
- ☆53Updated last year
- Accept by CVPR 2025 (highlight)☆22Updated 5 months ago
- ☆32Updated last year
- ☆32Updated this week
- ☆26Updated last year
- ☆20Updated 3 years ago