TrustAIRLab / JailbreakLLMsLinks
A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).
β14Updated last year
Alternatives and similar repositories for JailbreakLLMs
Users that are interested in JailbreakLLMs are comparing it to the libraries listed below
Sorting:
- The most comprehensive and accurate LLM jailbreak attack benchmark by farβ21Updated 7 months ago
- π₯π₯π₯ Detecting hidden backdoors in Large Language Models with only black-box accessβ45Updated 5 months ago
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMsβ12Updated 11 months ago
- β13Updated last year
- β25Updated 4 years ago
- β37Updated last year
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Modelsβ47Updated 9 months ago
- Code for Voice Jailbreak Attacks Against GPT-4o.β36Updated last year
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakersβ64Updated last year
- Code to conduct an embedding attack on LLMsβ27Updated 9 months ago
- A curated list of Meachine learning Security & Privacy papers published in security top-4 conferences (IEEE S&P, ACM CCS, USENIX Securityβ¦β299Updated 11 months ago
- Implementations of data poisoning attacks against neural networks and related defenses.β95Updated last year
- [NeurIPS 2025] BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Modelsβ227Updated last week
- TAP: An automated jailbreaking method for black-box LLMsβ194Updated 10 months ago
- [USENIX Security '24] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities agaiβ¦β52Updated 7 months ago
- [Usenix Security 2025] Official repo of paper PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMsβ16Updated 5 months ago
- β66Updated 5 years ago
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise aβ¦β109Updated last year
- β65Updated 10 months ago
- A repository to quickly generate synthetic data and associated trojaned deep learning modelsβ82Updated 2 years ago
- β25Updated 3 years ago
- Source code for the Energy-Latency Attacks via Sponge Poisoning paper.β16Updated 3 years ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]β357Updated 9 months ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Promptsβ533Updated last year
- Trojan Attack on Neural Networkβ188Updated 3 years ago
- TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classificaβ¦β301Updated 2 months ago
- β95Updated 2 years ago
- The automated prompt injection framework for LLM-integrated applications.β235Updated last year
- Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechanβ¦β14Updated last year
- Siren: Byzantine-robust Federated Learning via Proactive Alarming (SoCC '21)β11Updated last year