Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
β37Oct 15, 2023Updated 2 years ago
Alternatives and similar repositories for LLM-Multistep-Jailbreak
Users that are interested in LLM-Multistep-Jailbreak are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMsβ12Nov 7, 2024Updated last year
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β82Jun 6, 2024Updated last year
- Code for ACL 2024 paper: PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models.β16Feb 5, 2025Updated last year
- HKUST COMP4651 Fall 18/19: Cloud Computing and Big Data Systemsβ23Feb 6, 2019Updated 7 years ago
- Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)β20Oct 22, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word predβ¦β104Aug 13, 2024Updated last year
- Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidβ¦β23May 8, 2023Updated 3 years ago
- Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models β¦β15Sep 12, 2025Updated 8 months ago
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactionsβ14Mar 7, 2026Updated 2 months ago
- β25Jan 17, 2025Updated last year
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"β22May 6, 2025Updated last year
- Red Queen Dataset and data generation templateβ26Dec 26, 2025Updated 4 months ago
- β133Feb 3, 2025Updated last year
- CVPR 2023 generalistβ16Oct 25, 2023Updated 2 years ago
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)β18Oct 22, 2024Updated last year
- [ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".β172May 2, 2025Updated last year
- Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"β62Aug 8, 2024Updated last year
- β16Sep 4, 2024Updated last year
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Typesβ25Nov 29, 2024Updated last year
- β29Aug 31, 2025Updated 8 months ago
- code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)β24Apr 26, 2025Updated last year
- Accepted by ECCV 2024β206Oct 15, 2024Updated last year
- We jailbreak GPT-3.5 Turboβs safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20β¦β350Feb 23, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ204Jun 26, 2025Updated 10 months ago
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)β162Nov 30, 2024Updated last year
- Code for Findings-ACL 2023 paper: Sentence Embedding Leaks More Information than You Expect: Generative Embedding Inversion Attack to Recβ¦β47Jun 3, 2024Updated last year
- Extracting Cultural Commonsense Knowledge at Scale (WWW 2023)β11Feb 15, 2024Updated 2 years ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"β13Dec 13, 2024Updated last year
- β26Aug 18, 2023Updated 2 years ago
- β39Oct 15, 2024Updated last year
- [TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulationβ26Jun 17, 2025Updated 11 months ago
- β12Dec 22, 2025Updated 5 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- β40May 17, 2025Updated last year
- General research for Dreadnodeβ27Jun 17, 2024Updated last year
- β48May 9, 2024Updated 2 years ago
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"β65Apr 24, 2024Updated 2 years ago
- Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLMβ39Jan 17, 2025Updated last year
- β27Jun 5, 2024Updated last year
- β18Nov 30, 2022Updated 3 years ago