TrustAIRLab / JailbreakLLMsLinks

A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).

☆14

Alternatives and similar repositories for JailbreakLLMs

Users that are interested in JailbreakLLMs are comparing it to the libraries listed below

Sorting:

theshi-1128 / jailbreak-bench
The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆21Updated 7 months ago
SolidShen / BAIT
🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access
☆45Updated 5 months ago
qizhangli / Gradient-based-Jailbreak-Attacks
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Updated 11 months ago
SecurityNet-Research / SecurityNet
☆13Updated last year
iliaishacked / sponge_examples
☆25Updated 4 years ago
WUSTL-CSPL / LLMJailbreak
☆37Updated last year
verazuo / prompt-stealing-attack
[USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models
☆47Updated 9 months ago
TrustAIRLab / VoiceJailbreakAttack
Code for Voice Jailbreak Attacks Against GPT-4o.
☆36Updated last year
xirui-li / DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆64Updated last year
SchwinnL / LLM_Embedding_Attack
Code to conduct an embedding attack on LLMs
☆27Updated 9 months ago
gnipping / Awesome-ML-SP-Papers
A curated list of Meachine learning Security & Privacy papers published in security top-4 conferences (IEEE S&P, ACM CCS, USENIX Security…
☆299Updated 11 months ago
JonasGeiping / data-poisoning
Implementations of data poisoning attacks against neural networks and related defenses.
☆95Updated last year
bboylyg / BackdoorLLM
[NeurIPS 2025] BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
☆227Updated last week
RICommunity / TAP
TAP: An automated jailbreaking method for black-box LLMs
☆194Updated 10 months ago
datasec-lab / CodeBreaker
[USENIX Security '24] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities agai…
☆52Updated 7 months ago
aaFrostnova / Papillon
[Usenix Security 2025] Official repo of paper PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs
☆16Updated 5 months ago
AI-secure / Meta-Nerual-Trojan-Detection
☆66Updated 5 years ago
LLM-DRA / DRA
[USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…
☆109Updated last year
BHui97 / PLeak
☆65Updated 10 months ago
trojai / trojai
A repository to quickly generate synthetic data and associated trojaned deep learning models
☆82Updated 2 years ago
USSLab / PoltergeistAttack
☆25Updated 3 years ago
Cinofix / sponge_poisoning_energy_latency_attack
Source code for the Energy-Latency Attacks via Sponge Poisoning paper.
☆16Updated 3 years ago
tml-epfl / llm-adaptive-attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆357Updated 9 months ago
sherdencooper / GPTFuzz
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
☆533Updated last year
PurduePAML / TrojanNN
Trojan Attack on Neural Network
☆188Updated 3 years ago
ain-soph / trojanzoo
TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classifica…
☆301Updated 2 months ago
ebagdasa / multimodal_injection
☆95Updated 2 years ago
LLMSecurity / HouYi
The automated prompt injection framework for LLM-integrated applications.
☆235Updated last year
TrustAI-laboratory / Many-Shot-Jailbreaking-Demo
Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechan…
☆14Updated last year
AISIGSJTU / Siren
Siren: Byzantine-robust Federated Learning via Proactive Alarming (SoCC '21)
☆11Updated last year