TrustAIRLab / JailbreakLLMsLinks
A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).
☆15Updated last year
Alternatives and similar repositories for JailbreakLLMs
Users that are interested in JailbreakLLMs are comparing it to the libraries listed below
Sorting:
- ☆14Updated last year
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆22Updated 10 months ago
- ☆37Updated last year
- 🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access☆52Updated 8 months ago
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs☆12Updated last year
- A curated list of Meachine learning Security & Privacy papers published in security top-4 conferences (IEEE S&P, ACM CCS, USENIX Security…☆332Updated 2 months ago
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆49Updated last year
- [NeurIPS 2025] BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models☆273Updated this week
- The automated prompt injection framework for LLM-integrated applications.☆253Updated last year
- ☆27Updated 4 years ago
- Implementations of data poisoning attacks against neural networks and related defenses.☆102Updated last year
- A curated list of academic events on AI Security & Privacy☆167Updated last year
- Trojan Attack on Neural Network☆191Updated 3 years ago
- TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classifica…☆302Updated 5 months ago
- ☆68Updated 5 years ago
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…☆113Updated last year
- Simple PyTorch implementations of Badnets on MNIST and CIFAR10.☆193Updated 3 years ago
- [USENIX Security '24] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities agai…☆56Updated 10 months ago
- This is the source code for Data-free Backdoor. Our paper is accepted by the 32nd USENIX Security Symposium (USENIX Security 2023).☆34Updated 2 years ago
- Statistics of acceptance rate for the top conferences: Oakland, CCS, USENIX Security, NDSS.☆213Updated 3 months ago
- This repository provides a benchmark for prompt injection attacks and defenses in LLMs☆384Updated 3 months ago
- Code implementation of the paper "Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks", at IEEE Security and P…☆314Updated 5 years ago
- Reference implementation of the PRADA model stealing defense. IEEE Euro S&P 2019.☆35Updated 6 years ago
- Safety at Scale: A Comprehensive Survey of Large Model Safety☆225Updated this week
- ☆101Updated 5 years ago
- PFI: Prompt Flow Integrity to Prevent Privilege Escalation in LLM Agents☆26Updated 10 months ago
- ☆19Updated 3 years ago
- A curated list of papers & resources linked to data poisoning, backdoor attacks and defenses against them (no longer maintained)☆286Updated last year
- Code for the paper Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers☆60Updated 3 years ago
- Code and full version of the paper "Hijacking Attacks against Neural Network by Analyzing Training Data"☆14Updated last year