qizhangli / Gradient-based-Jailbreak-AttacksLinks
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Updated 9 months ago
Alternatives and similar repositories for Gradient-based-Jailbreak-Attacks
Users that are interested in Gradient-based-Jailbreak-Attacks are comparing it to the libraries listed below
Sorting:
- Official codebase for Image Hijacks: Adversarial Images can Control Generative Models at Runtime☆50Updated last year
- ☆50Updated last year
- ☆102Updated last year
- Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models☆31Updated 3 months ago
- Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)☆26Updated 9 months ago
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples☆27Updated 2 years ago
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆43Updated 7 months ago
- ☆20Updated 5 months ago
- Code to conduct an embedding attack on LLMs☆27Updated 7 months ago
- A list of recent papers about adversarial learning☆204Updated last week
- ☆53Updated 2 years ago
- ☆24Updated last year
- [ArXiv 2024] Denial-of-Service Poisoning Attacks on Large Language Models☆20Updated 10 months ago
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"☆16Updated 3 months ago
- Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)☆10Updated 10 months ago
- ☆30Updated 3 months ago
- official implementation of Towards Robust Model Watermark via Reducing Parametric Vulnerability☆15Updated last year
- Code Repository for the Paper ---Revisiting the Assumption of Latent Separability for Backdoor Defenses (ICLR 2023)☆44Updated 2 years ago
- Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)☆39Updated last year
- ☆22Updated 7 months ago
- ☆47Updated last year
- ☆60Updated 5 months ago
- ☆32Updated 3 months ago
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …☆27Updated 10 months ago
- ☆24Updated last year
- The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on …☆19Updated 2 years ago
- [ICLR 2024] Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images☆38Updated last year
- Official implementation of the CVPR 2022 paper "Backdoor Attacks on Self-Supervised Learning".☆75Updated last year
- Implementation of BadCLIP https://arxiv.org/pdf/2311.16194.pdf☆21Updated last year
- Repository for the Paper: Refusing Safe Prompts for Multi-modal Large Language Models☆18Updated 10 months ago