LLMSecurity / MasterKey
MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks and evaluating their defenses.
☆10Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for MasterKey
- ☆11Updated 2 months ago
- ☆17Updated 3 weeks ago
- ☆14Updated 11 months ago
- ☆17Updated 2 years ago
- ☆18Updated 4 months ago
- ☆9Updated 2 years ago
- A toolbox for backdoor attacks.☆19Updated last year
- ☆18Updated last year
- Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks (IEEE S&P 2024)☆31Updated 7 months ago
- WaNet - Imperceptible Warping-based Backdoor Attack (ICLR 2021)☆111Updated this week
- multi-bit language model watermarking (NAACL 24)☆11Updated last month
- ☆73Updated 7 months ago
- Repository for Towards Codable Watermarking for Large Language Models☆29Updated last year
- ☆11Updated 10 months ago
- Invisible Backdoor Attack with Sample-Specific Triggers☆90Updated 2 years ago
- 复现了下Neural Cleanse这篇论文,真的是简单而有效,发在了okaland☆30Updated 3 years ago
- A Pytroch Implementation of Some Backdoor Attack Algorithms, Including BadNets, SIG, FIBA, FTrojan ...☆13Updated 6 months ago
- A curated list of papers & resources on backdoor attacks and defenses in deep learning.☆176Updated 7 months ago
- Distribution Preserving Backdoor Attack in Self-supervised Learning☆11Updated 9 months ago
- ☆13Updated 2 years ago
- ☆9Updated 2 months ago
- This is an official repository for Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study (ICCV2023…☆20Updated last year
- ☆210Updated 5 months ago
- [IEEE S&P'24] ODSCAN: Backdoor Scanning for Object Detection Models☆11Updated 5 months ago
- This is the implementation for CVPR 2022 Oral paper "Better Trigger Inversion Optimization in Backdoor Scanning."☆24Updated 2 years ago
- ☆11Updated 2 weeks ago
- ☆21Updated last year
- Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''☆50Updated last year
- Code for paper: "PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification", IEEE S&P 2024.☆28Updated 3 months ago
- ☆25Updated 2 years ago