LLM-DRA/DRA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/LLM-DRA/DRA)

LLM-DRA / DRA

[USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction"

☆116

Alternatives and similar repositories for DRA

Users that are interested in DRA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xirui-li / DrAttack
View on GitHub
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆68Aug 25, 2024Updated last year
huizhang-L / CodeChameleon
View on GitHub
☆30Mar 20, 2024Updated 2 years ago
AI45Lab / ActorAttack
View on GitHub
☆135Jun 29, 2026Updated 3 weeks ago
weizeming / momentum-attack-llm
View on GitHub
☆25Jan 17, 2025Updated last year
AI45Lab / CodeAttack
View on GitHub
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆61Oct 1, 2025Updated 9 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
YancyKahn / CoA
View on GitHub
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆39Jan 17, 2025Updated last year
LLMSecurity / MasterKey
View on GitHub
MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks…
☆38Sep 12, 2024Updated last year
kriti-hippo / red_queen
View on GitHub
Red Queen Dataset and data generation template
☆27Dec 26, 2025Updated 6 months ago
patrickrchao / JailbreakingLLMs
View on GitHub
☆756Jul 2, 2025Updated last year
Confirm-Solutions / flrt
View on GitHub
Fluent student-teacher redteaming
☆23Jul 25, 2024Updated last year
PKU-ML / PAT
View on GitHub
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆22May 6, 2025Updated last year
CryptoAILab / misalignment
View on GitHub
[NDSS'25] The official implementation of safety misalignment.
☆19Jan 8, 2025Updated last year
NY1024 / SafeBench
View on GitHub
☆22Oct 25, 2024Updated last year
sail-sg / I-FSJ
View on GitHub
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
☆65Jan 11, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
GraySwanAI / nanoGCG
View on GitHub
A fast + lightweight implementation of the GCG algorithm in PyTorch
☆343May 13, 2025Updated last year
lapisrocks / rpo
View on GitHub
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆62Aug 8, 2024Updated last year
CryptoAILab / JailbreakEval
View on GitHub
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆193Apr 1, 2025Updated last year
thu-coai / JailbreakDefense_GoalPriority
View on GitHub
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Jul 9, 2024Updated 2 years ago
SproutNan / AI-Safety_Benchmark
View on GitHub
The official repository for guided jailbreak benchmark
☆31Jul 28, 2025Updated 11 months ago
NY1024 / RACE
View on GitHub
☆27Mar 17, 2025Updated last year
NJUNLP / ReNeLLM
View on GitHub
The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…
☆163Sep 2, 2025Updated 10 months ago
Allen-piexl / JailbreakZoo
View on GitHub
☆171Sep 2, 2024Updated last year
RICommunity / TAP
View on GitHub
TAP: An automated jailbreaking method for black-box LLMs
☆241Dec 10, 2024Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
theshi-1128 / ReDPJ
View on GitHub
A novel jailbreak attack unveiling an overlooked attack surface inherently in the chain-of-thought reasoning trajectory of LLMs
☆22Apr 3, 2026Updated 3 months ago
RUCAIBox / HADES
View on GitHub
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆39Oct 23, 2024Updated last year
verazuo / prompt-stealing-attack
View on GitHub
[USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models
☆53Jan 11, 2025Updated last year
centerforaisafety / HarmBench
View on GitHub
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
☆1,011Aug 16, 2024Updated last year
MaTengSYSU / HIMRD-jailbreak
View on GitHub
Code repository for the paper "Heuristic Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models"
☆19Aug 7, 2025Updated 11 months ago
Yu-Fangxu / COLD-Attack
View on GitHub
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆176Dec 18, 2024Updated last year
LLM-Tuning-Safety / LLMs-Finetuning-Safety
View on GitHub
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆358Feb 23, 2024Updated 2 years ago
Jinxiaolong1129 / Foot-in-the-door-Jailbreak
View on GitHub
☆23May 14, 2025Updated last year
umutoztunc / whitesymex
View on GitHub
Symbolic execution engine for Whitespace.
☆14May 30, 2021Updated 5 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
View on GitHub
☆81Mar 30, 2025Updated last year
thefcraft / prompt-generator-stable-diffusion
View on GitHub
Prompt Generator model for Stable Diffusion Models
☆12Jun 20, 2023Updated 3 years ago
tml-epfl / llm-adaptive-attacks
View on GitHub
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆391Jan 23, 2025Updated last year
uw-nsl / ArtPrompt
View on GitHub
[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
☆102Aug 15, 2025Updated 11 months ago
alphadl / SafeLLM_with_IntentionAnalysis
View on GitHub
Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting
☆21Mar 25, 2024Updated 2 years ago
XuandongZhao / weak-to-strong
View on GitHub
[ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models
☆90May 2, 2025Updated last year
yjw1029 / Self-Reminder
View on GitHub
Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.
☆57Nov 13, 2023Updated 2 years ago