Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Nov 7, 2024Updated last year
Alternatives and similar repositories for Gradient-based-Jailbreak-Attacks
Users that are interested in Gradient-based-Jailbreak-Attacks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆36Oct 15, 2023Updated 2 years ago
- [ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities☆29Apr 2, 2025Updated 11 months ago
- A repo for LLM jailbreak☆14Sep 5, 2023Updated 2 years ago
- [Tensorflow] A Game Theoretic approach using GAN for Phishing URL synthesis and detection☆11Nov 14, 2022Updated 3 years ago
- ☆22Aug 8, 2025Updated 7 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- This novel adversarial attack method that I have developed is called CEIA (Contextual Embedding Inversion Attack). This method is a sophi…☆16Nov 19, 2024Updated last year
- Repository for project codes related to turbulence modelling.☆17Sep 30, 2024Updated last year
- ☆17Mar 8, 2024Updated 2 years ago
- [CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.☆29Jul 29, 2024Updated last year
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆18Apr 24, 2024Updated last year
- Blogs that I'm actively following.☆14Sep 17, 2023Updated 2 years ago
- 2021年暨南大学CTF新生赛题目与源码☆15Dec 6, 2021Updated 4 years ago
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- Identification of the Adversary from a Single Adversarial Example (ICML 2023)☆10Jul 15, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Links to publications that focus on the interpretation and analysis of in-context learning☆15Oct 17, 2024Updated last year
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆380Jan 23, 2025Updated last year
- ☆10Apr 28, 2020Updated 5 years ago
- Advanced Machine Learning Fall 2020 Project Repository☆12Dec 12, 2020Updated 5 years ago
- CTF — 学习笔记&比赛题目&WP☆24Dec 9, 2023Updated 2 years ago
- [EMNLP'22] Textual Manifold-based Defense Against Natural Language Adversarial Examples☆11Apr 6, 2023Updated 2 years ago
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆18Jan 14, 2025Updated last year
- LLM手撕代码合集☆21Mar 25, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Analyzing LLM Alignment via Token distribution shift☆17Jan 26, 2024Updated 2 years ago
- ☆23May 20, 2025Updated 10 months ago
- 百度AI安全对抗赛第一名团队示例代码,基于官方给出的PGD修改,主要内容为L2-PGD+EOT。☆11Mar 17, 2021Updated 5 years ago
- Craft poisoned data using MetaPoison☆54Apr 5, 2021Updated 4 years ago
- ☆22Dec 22, 2024Updated last year
- Official implementation of the paper “Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning”☆20Aug 20, 2025Updated 7 months ago
- The implementation of the Block Coordinate Regularization by Denoising (BC-RED) algorithm (NeurIPS 2019)☆10Oct 15, 2019Updated 6 years ago
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"☆22May 6, 2025Updated 10 months ago
- [CVPRW'22] A privacy attack that exploits Adversarial Training models to compromise the privacy of Federated Learning systems.☆12Jul 7, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective☆21Feb 28, 2025Updated last year
- Code for NDSS '25 paper "Passive Inference Attacks on Split Learning via Adversarial Regularization"☆13Sep 16, 2024Updated last year
- ☆18Jul 2, 2023Updated 2 years ago
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆21Mar 7, 2024Updated 2 years ago
- Machine learning enabled dropper☆28May 1, 2023Updated 2 years ago
- UCAS大三自然语言处理课程大作业☆12Jun 25, 2023Updated 2 years ago
- NTIRE 2020 Real Image Denoising Challenge - ZJU231☆10Mar 26, 2020Updated 6 years ago