TrustAIRLab / JailbreakLLMsLinks
A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).
☆12Updated last year
Alternatives and similar repositories for JailbreakLLMs
Users that are interested in JailbreakLLMs are comparing it to the libraries listed below
Sorting:
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆39Updated 6 months ago
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆19Updated 3 months ago
- Code for the paper "Watermarking Makes Language Models Radioactive"☆17Updated 8 months ago
- [Usenix Security 2025] Official repo of paper PAPILLON: Efficient and Stealthy Fuzz Testing-Powered Jailbreaks for LLMs☆14Updated last month
- ☆34Updated 9 months ago
- ☆13Updated last year
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆53Updated 10 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆320Updated 5 months ago
- Code for "Biometric Backdoors: A Poisoning Attack Against Unsupervised Template Updating"☆11Updated 3 years ago
- GPTZoo: A Large-scale Dataset of GPTs for the Research Community☆20Updated last year
- This repository provides a benchmark for prompt Injection attacks and defenses☆245Updated last month
- LLM security and privacy☆48Updated 9 months ago
- 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exploiting large language model AI solutions.☆23Updated last year
- ☆186Updated 3 months ago
- A curated list of Meachine learning Security & Privacy papers published in security top-4 conferences (IEEE S&P, ACM CCS, USENIX Security…☆275Updated 7 months ago
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs☆11Updated 8 months ago
- ☆573Updated 2 weeks ago
- TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classifica…☆291Updated 11 months ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆88Updated last year
- ☆24Updated 3 years ago
- TAP: An automated jailbreaking method for black-box LLMs☆176Updated 7 months ago
- ☆58Updated 6 months ago
- LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins☆25Updated 11 months ago
- [USENIX Security '24] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities agai…☆47Updated 3 months ago
- Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition, CVPR 2023, Highlight☆43Updated last year
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆142Updated 7 months ago
- BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models☆171Updated 3 weeks ago
- ☆102Updated last year
- ☆91Updated last year
- [AAAI 2024] DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models☆11Updated 7 months ago