Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
☆36Oct 15, 2023Updated 2 years ago
Alternatives and similar repositories for LLM-Multistep-Jailbreak
Users that are interested in LLM-Multistep-Jailbreak are comparing it to the libraries listed below
Sorting:
- Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confid…☆23May 8, 2023Updated 2 years ago
- ☆12Dec 22, 2025Updated 2 months ago
- HKUST COMP4651 Fall 18/19: Cloud Computing and Big Data Systems☆22Feb 6, 2019Updated 7 years ago
- [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…☆80Jun 6, 2024Updated last year
- The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word pred…☆104Aug 13, 2024Updated last year
- Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"☆61Aug 8, 2024Updated last year
- Consuming Resrouce via Auto-generation for LLM-DoS Attack under Black-box Settings☆18Sep 1, 2025Updated 6 months ago
- code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)☆22Apr 26, 2025Updated 10 months ago
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆341Feb 23, 2024Updated 2 years ago
- [ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".☆165May 2, 2025Updated 10 months ago
- Red Queen Dataset and data generation template☆26Dec 26, 2025Updated 2 months ago
- CVPR 2023 generalist☆16Oct 25, 2023Updated 2 years ago
- Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)☆17Oct 22, 2024Updated last year
- ☆121Feb 3, 2025Updated last year
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆191Jun 26, 2025Updated 8 months ago
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆163Nov 30, 2024Updated last year
- ☆17Sep 4, 2024Updated last year
- Code for the CVPR 2020 article "Adversarial Vertex mixup: Toward Better Adversarially Robust Generalization"☆13Jul 13, 2020Updated 5 years ago
- ☆18Nov 30, 2022Updated 3 years ago
- Code for ACL 2024 paper: PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models.☆16Feb 5, 2025Updated last year
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆25Nov 29, 2024Updated last year
- ☆23Jan 17, 2025Updated last year
- ☆14Jul 11, 2019Updated 6 years ago
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆20Aug 10, 2024Updated last year
- ☆196Nov 26, 2023Updated 2 years ago
- ☆64Jun 1, 2025Updated 9 months ago
- Privacy Risks of Securing Machine Learning Models against Adversarial Examples☆46Nov 25, 2019Updated 6 years ago
- Divide-and-Conquer Attack: Harnessing the Power of LLM to Bypass the Censorship of Text-to-Image Generation Mode☆18Feb 16, 2025Updated last year
- [CCS'22] SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders☆18Jul 12, 2022Updated 3 years ago
- ☆48May 9, 2024Updated last year
- [NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents☆66Nov 14, 2025Updated 3 months ago
- Code for Findings-ACL 2023 paper: Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Rec…☆48Jun 3, 2024Updated last year
- RAG-based chatbot for retail e-commerce.☆31Dec 1, 2024Updated last year
- ☆24Jun 17, 2025Updated 8 months ago
- ☆118Jul 2, 2024Updated last year
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆57Aug 17, 2024Updated last year
- Emoji Attack [ICML 2025]☆41Jul 15, 2025Updated 7 months ago
- ☆24Aug 18, 2023Updated 2 years ago
- ☆86Mar 20, 2025Updated 11 months ago