TrustAIRLab / JailbreakLLMsLinks
A dataset consists of 6,387 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 666 jailbreak prompts).
☆13Updated last year
Alternatives and similar repositories for JailbreakLLMs
Users that are interested in JailbreakLLMs are comparing it to the libraries listed below
Sorting:
- ☆13Updated last year
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆21Updated 6 months ago
- 🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access☆44Updated 4 months ago
- A curated list of Meachine learning Security & Privacy papers published in security top-4 conferences (IEEE S&P, ACM CCS, USENIX Security…☆297Updated 10 months ago
- Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs☆12Updated 11 months ago
- ☆36Updated last year
- Code for NDSS 2022 paper "MIRROR: Model Inversion for Deep Learning Network with High Fidelity"☆27Updated 2 years ago
- Code for ML Doctor☆90Updated last year
- TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classifica…☆301Updated last month
- ☆25Updated 3 years ago
- This is the source code for Data-free Backdoor. Our paper is accepted by the 32nd USENIX Security Symposium (USENIX Security 2023).☆31Updated 2 years ago
- ☆66Updated 5 years ago
- Papers I have collected and read in undergraduate and graduate period☆51Updated 2 years ago
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆45Updated 8 months ago
- Trojan Attack on Neural Network☆188Updated 3 years ago
- Code for the paper Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers☆59Updated 3 years ago
- A list of recent adversarial attack and defense papers (including those on large language models)☆43Updated last week
- Source code for the Energy-Latency Attacks via Sponge Poisoning paper.☆16Updated 3 years ago
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…☆108Updated 11 months ago
- ☆16Updated last year
- Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks (IEEE S&P 2024)☆34Updated 3 months ago
- ☆18Updated 3 years ago
- A list of recent papers about adversarial learning☆216Updated last week
- A curated list of academic events on AI Security & Privacy☆162Updated last year
- ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation☆51Updated 3 years ago
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples☆30Updated 2 years ago
- Statistics of acceptance rate for the top conferences: Oakland, CCS, USENIX Security, NDSS.☆181Updated last month
- ☆223Updated last month
- Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples☆19Updated 3 years ago
- Watermarking against model extraction attacks in MLaaS. ACM MM 2021.☆33Updated 4 years ago