TrustAIRLab / JailbreakLLMsLinks
A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).
☆13Updated last year
Alternatives and similar repositories for JailbreakLLMs
Users that are interested in JailbreakLLMs are comparing it to the libraries listed below
Sorting:
- ☆13Updated last year
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs☆12Updated 9 months ago
- 🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access☆41Updated 3 months ago
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆21Updated 5 months ago
- A curated list of Meachine learning Security & Privacy papers published in security top-4 conferences (IEEE S&P, ACM CCS, USENIX Security…☆288Updated 9 months ago
- [AAAI 2024] DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models☆11Updated 8 months ago
- Code for Voice Jailbreak Attacks Against GPT-4o.☆32Updated last year
- ☆35Updated 11 months ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆58Updated last year
- ☆25Updated 3 years ago
- ☆66Updated 4 years ago
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆43Updated 7 months ago
- Source code for the Energy-Latency Attacks via Sponge Poisoning paper.☆15Updated 3 years ago
- A curated list of academic events on AI Security & Privacy☆160Updated last year
- ☆98Updated 4 years ago
- Code to conduct an embedding attack on LLMs☆27Updated 7 months ago
- A repository to quickly generate synthetic data and associated trojaned deep learning models☆79Updated 2 years ago
- Code for paper: "PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification", IEEE S&P 2024.☆32Updated last year
- ☆16Updated 11 months ago
- Website & Documentation: https://sbaresearch.github.io/model-watermarking/☆24Updated last year
- [IEEE S&P'24] ODSCAN: Backdoor Scanning for Object Detection Models☆17Updated 8 months ago
- TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classifica…☆300Updated last week
- This is the source code for Data-free Backdoor. Our paper is accepted by the 32nd USENIX Security Symposium (USENIX Security 2023).☆31Updated last year
- Code implementation of the paper "Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks", at IEEE Security and P…☆296Updated 5 years ago
- Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks (IEEE S&P 2024)☆34Updated 2 months ago
- A list of recent papers about adversarial learning☆204Updated last week
- [NDSS 2025] "CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models"☆16Updated 2 weeks ago
- Breaking Certifiable Defenses☆17Updated 2 years ago
- Camouflage YOLO - (CAMOLO) trains adversarial patches to confuse the YOLO family of object detectors.☆11Updated 2 years ago
- Code for our S&P'21 paper: Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding☆53Updated 2 years ago