Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Nov 7, 2024Updated last year
Alternatives and similar repositories for Gradient-based-Jailbreak-Attacks
Users that are interested in Gradient-based-Jailbreak-Attacks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆37Oct 15, 2023Updated 2 years ago
- [ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities☆29Apr 2, 2025Updated last year
- A repo for LLM jailbreak☆14Sep 5, 2023Updated 2 years ago
- [Tensorflow] A Game Theoretic approach using GAN for Phishing URL synthesis and detection☆11Nov 14, 2022Updated 3 years ago
- ☆22Aug 8, 2025Updated 8 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This novel adversarial attack method that I have developed is called CEIA (Contextual Embedding Inversion Attack). This method is a sophi…☆16Nov 19, 2024Updated last year
- Repository for project codes related to turbulence modelling.☆17Sep 30, 2024Updated last year
- ☆17Mar 8, 2024Updated 2 years ago
- [CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.☆30Jul 29, 2024Updated last year
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆18Apr 24, 2024Updated last year
- Blogs that I'm actively following.☆15Sep 17, 2023Updated 2 years ago
- 2021年暨南大学CTF新生赛题目与源码☆15Dec 6, 2021Updated 4 years ago
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- Identification of the Adversary from a Single Adversarial Example (ICML 2023)☆10Jul 15, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Links to publications that focus on the interpretation and analysis of in-context learning☆15Oct 17, 2024Updated last year
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆381Jan 23, 2025Updated last year
- ☆10Apr 28, 2020Updated 5 years ago
- Advanced Machine Learning Fall 2020 Project Repository☆12Dec 12, 2020Updated 5 years ago
- CTF — 学习笔记&比赛题目&WP☆24Dec 9, 2023Updated 2 years ago
- [EMNLP'22] Textual Manifold-based Defense Against Natural Language Adversarial Examples☆11Apr 6, 2023Updated 3 years ago
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆18Jan 14, 2025Updated last year
- LLM手撕代码合集☆21Mar 25, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Analyzing LLM Alignment via Token distribution shift☆18Jan 26, 2024Updated 2 years ago
- ☆23May 20, 2025Updated 10 months ago
- 百度AI安全对抗赛第一名团队示例代码,基于官方给出的PGD修改,主要内容为L2-PGD+EOT。☆11Mar 17, 2021Updated 5 years ago
- Craft poisoned data using MetaPoison☆54Apr 5, 2021Updated 5 years ago
- ☆23Dec 22, 2024Updated last year
- Official implementation of the paper “Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning”☆20Aug 20, 2025Updated 7 months ago
- The implementation of the Block Coordinate Regularization by Denoising (BC-RED) algorithm (NeurIPS 2019)☆10Oct 15, 2019Updated 6 years ago
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"☆22May 6, 2025Updated 11 months ago
- [CVPRW'22] A privacy attack that exploits Adversarial Training models to compromise the privacy of Federated Learning systems.☆12Jul 7, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective☆23Feb 28, 2025Updated last year
- Code for NDSS '25 paper "Passive Inference Attacks on Split Learning via Adversarial Regularization"☆13Sep 16, 2024Updated last year
- ☆18Jul 2, 2023Updated 2 years ago
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆21Mar 7, 2024Updated 2 years ago
- UCAS大三自然语言处理课程大作业☆12Jun 25, 2023Updated 2 years ago
- Machine learning enabled dropper☆28May 1, 2023Updated 2 years ago
- NTIRE 2020 Real Image Denoising Challenge - ZJU231☆10Mar 26, 2020Updated 6 years ago