qizhangli / Gradient-based-Jailbreak-AttacksLinks
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Updated 11 months ago
Alternatives and similar repositories for Gradient-based-Jailbreak-Attacks
Users that are interested in Gradient-based-Jailbreak-Attacks are comparing it to the libraries listed below
Sorting:
- ICCV 2021, We find most existing triggers of backdoor attacks in deep learning contain severe artifacts in the frequency domain. This Rep…☆45Updated 3 years ago
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples☆30Updated 2 years ago
- ☆105Updated last year
- Official implementation of the CVPR 2022 paper "Backdoor Attacks on Self-Supervised Learning".☆76Updated 2 years ago
- Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)☆26Updated 11 months ago
- Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models☆32Updated 5 months ago
- ☆53Updated last year
- Official codebase for Image Hijacks: Adversarial Images can Control Generative Models at Runtime☆51Updated 2 years ago
- ☆24Updated last year
- A list of recent papers about adversarial learning☆231Updated this week
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆47Updated 9 months ago
- Implementation of the paper "Improving the Accuracy-Robustness Trade-off of Classifiers via Adaptive Smoothing".☆10Updated last year
- ☆10Updated 3 years ago
- The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster …☆55Updated last month
- The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on …☆19Updated 2 years ago
- SaTML 2023, 1st place in CVPR’21 Security AI Challenger: Unrestricted Adversarial Attacks on ImageNet.☆27Updated 2 years ago
- Code for the paper Boosting Accuracy and Robustness of Student Models via Adaptive Adversarial Distillation (CVPR 2023).☆33Updated 2 years ago
- ☆23Updated 9 months ago
- Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models☆30Updated 10 months ago
- Code Repository for the Paper ---Revisiting the Assumption of Latent Separability for Backdoor Defenses (ICLR 2023)☆44Updated 2 years ago
- A curated list of trustworthy Generative AI papers. Daily updating...☆75Updated last year
- ☆53Updated 2 years ago
- Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)☆39Updated last year
- [CVPR23W] "A Pilot Study of Query-Free Adversarial Attack against Stable Diffusion" by Haomin Zhuang, Yihua Zhang and Sijia Liu☆26Updated last year
- ☆12Updated 3 years ago
- Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"☆58Updated 2 years ago
- [ICLR'21] Dataset Inference for Ownership Resolution in Machine Learning☆32Updated 3 years ago
- ☆25Updated last year
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"☆20Updated 5 months ago
- ☆25Updated 2 years ago