aaFrostnova / PapillonLinks

[Usenix Security 2025] Official repo of paper PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs

☆16

Alternatives and similar repositories for Papillon

Users that are interested in Papillon are comparing it to the libraries listed below

Sorting:

theshi-1128 / jailbreak-bench
The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆21Updated 7 months ago
TrustAI-laboratory / Many-Shot-Jailbreaking-Demo
Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechan…
☆14Updated last year
UNHSAILLab / working-memory-attack-on-llms
Working Memory Attack on LLMs
☆16Updated 5 months ago
shadowkiller33 / Language_attack
A repo for LLM jailbreak
☆14Updated 2 years ago
Aegis1863 / xJailbreak
Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"
☆14Updated 7 months ago
xirui-li / DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆64Updated last year
NJUNLP / ReNeLLM
The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…
☆141Updated 2 months ago
xunguangwang / SoK4JailbreakGuardrails
SoK: Evaluating Jailbreak Guardrails for Large Language Models
☆18Updated 2 weeks ago
NLie2 / what_features_jailbreak_LLMs
☆17Updated 7 months ago
BHui97 / PLeak
☆65Updated 10 months ago
BenderScript / PromptInjectionBench
Prompt Injection Attacks against GPT-4, Gemini, Azure, Azure with Jailbreak
☆28Updated last year
Vinsonzyh / BlueSuffix
[ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
☆27Updated 6 months ago
kztakemoto / simbaja
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
☆18Updated last year
uw-nsl / ArtPrompt
[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
☆90Updated 2 months ago
zshcatsandevops / Grok3JailbreaksDB2024
1.0
☆11Updated 4 months ago
theshi-1128 / ReDPJ
☆21Updated last month
NY1024 / Jailbreak_GPT4o
☆25Updated last year
liu00222 / Open-Prompt-Injection
This repository provides a benchmark for prompt injection attacks and defenses
☆318Updated this week
RICommunity / TAP
TAP: An automated jailbreaking method for black-box LLMs
☆194Updated 10 months ago
TrustAIRLab / JailbreakLLMs
A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).
☆14Updated last year
Allen-piexl / JailbreakZoo
☆153Updated last year
CryptoAILab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆177Updated 4 months ago
LLM-DRA / DRA
[USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…
☆109Updated last year
RylanSchaeffer / AstraFellowship-When-Do-VLM-Image-Jailbreaks-Transfer
Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
☆32Updated 5 months ago
XHMY / AutoDefense
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks
☆56Updated 5 months ago
rucnyz / LeakAgent
☆24Updated 2 months ago
facebookresearch / multimodal-fusion-jailbreaks
Official repository for the paper "Gradient-based Jailbreak Images for Multimodal Fusion Models" (https//arxiv.org/abs/2410.03489)
☆19Updated last year
ssahibsingh / MERN-Session-TSS23
Web Development Session Thapar Summer School 2023
☆11Updated 2 years ago
KutalVolkan / many-shot-jailbreaking-dataset
Q&A dataset for many-shot jailbreaking
☆12Updated last year
tml-epfl / llm-adaptive-attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆357Updated 9 months ago