qizhangli / Gradient-based-Jailbreak-AttacksLinks
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Updated last year
Alternatives and similar repositories for Gradient-based-Jailbreak-Attacks
Users that are interested in Gradient-based-Jailbreak-Attacks are comparing it to the libraries listed below
Sorting:
- ☆53Updated last year
- The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster …☆56Updated 2 months ago
- Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)☆27Updated last year
- ☆107Updated last year
- Official codebase for Image Hijacks: Adversarial Images can Control Generative Models at Runtime☆53Updated 2 years ago
- ☆23Updated 10 months ago
- Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models☆30Updated 11 months ago
- ☆24Updated last year
- Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"☆59Updated last year
- Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)☆39Updated last year
- ☆52Updated last year
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆62Updated last year
- Implementation of the paper "Improving the Accuracy-Robustness Trade-off of Classifiers via Adaptive Smoothing".☆10Updated last year
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples