AI45Lab/CodeAttack

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AI45Lab/CodeAttack)

AI45Lab / CodeAttack

[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

☆61

Alternatives and similar repositories for CodeAttack

Users that are interested in CodeAttack are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AI45Lab / ActorAttack
View on GitHub
☆135Jun 29, 2026Updated 3 weeks ago
AI45Lab / DEAN
View on GitHub
☆11Oct 25, 2024Updated last year
ZZR0 / CodeAttack
View on GitHub
Adversarial Attack for Pre-trained Code Models
☆10Jul 19, 2022Updated 4 years ago
RobustNLP / DeRTa
View on GitHub
A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.
☆72May 22, 2025Updated last year
InvokerStark / OverKill
View on GitHub
☆15Jun 13, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
rain152 / PAT
View on GitHub
[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning
☆11Oct 29, 2024Updated last year
AI45Lab / VLSBench
View on GitHub
[ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety
☆62Jul 21, 2025Updated last year
Jinxiaolong1129 / Foot-in-the-door-Jailbreak
View on GitHub
☆23May 14, 2025Updated last year
thu-coai / JailbreakDefense_GoalPriority
View on GitHub
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Jul 9, 2024Updated 2 years ago
wicai24 / DOOR-Alignment
View on GitHub
☆20Apr 7, 2025Updated last year
patrickrchao / JailbreakingLLMs
View on GitHub
☆756Jul 2, 2025Updated last year
Sensente / Security-Attacks-on-LCCTs
View on GitHub
Security Attacks on LLM-based Code Completion Tools (AAAI 2025)
☆24Dec 31, 2025Updated 6 months ago
huizhang-L / CodeChameleon
View on GitHub
☆30Mar 20, 2024Updated 2 years ago
weiyezhimeng / SQL-Injection-Jailbreak
View on GitHub
☆22Jul 26, 2025Updated 11 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
PKU-ML / PAT
View on GitHub
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆22May 6, 2025Updated last year
kztakemoto / simbaja
View on GitHub
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
☆17Apr 24, 2024Updated 2 years ago
EasyJailbreak / EasyJailbreak
View on GitHub
An easy-to-use Python framework to generate adversarial jailbreak prompts.
☆873Mar 30, 2026Updated 3 months ago
zhaoyiran924 / Probe-Sampling
View on GitHub
[NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling
☆35Nov 8, 2024Updated last year
YihanWang617 / LLM-Jailbreaking-Defense-Backtranslation
View on GitHub
Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"
☆34Aug 16, 2024Updated last year
Dtc7w3PQ / Response-Attack
View on GitHub
Official implementation of “Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models” (AAAI 2026).
☆37Mar 22, 2026Updated 3 months ago
Alibaba-AAIG / Strata-Sword
View on GitHub
The Strata-Sword is a hierarchical Chinese-English jailbreak safety benchmark based on quantified reasoning complexity, developed in-hous…
☆22Sep 3, 2025Updated 10 months ago
yuplin2333 / representation-space-jailbreak
View on GitHub
Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…
☆24Jul 26, 2024Updated last year
zhxieml / remiss-jailbreak
View on GitHub
☆33Jun 24, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
chujiezheng / LLM-Safeguard
View on GitHub
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆108May 20, 2025Updated last year
uw-nsl / SafeDecoding
View on GitHub
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆154Jul 19, 2024Updated 2 years ago
YisenWang / dynamic_adv_training
View on GitHub
Code for ICML2019 Paper "On the Convergence and Robustness of Adversarial Training"
☆34Apr 28, 2020Updated 6 years ago
yueliu1999 / FlipAttack
View on GitHub
[ICML 2025] An official source code for paper "FlipAttack: Jailbreak LLMs via Flipping".
☆178May 2, 2025Updated last year
CUHK-Shenzhen-SE / D4C
View on GitHub
[ICSE'25] Aligning the Objective of LLM-based Program Repair
☆24Mar 8, 2025Updated last year
Princeton-SysML / Jailbreak_LLM
View on GitHub
☆203Nov 26, 2023Updated 2 years ago
wonderNefelibata / Awesome-LRM-Safety
View on GitHub
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆84Updated this week
David-Li0406 / AI-Supervision-Risk
View on GitHub
☆21Mar 17, 2025Updated last year
sail-sg / imperceptible-jailbreaks
View on GitHub
[ArXiv 2025] Imperceptible Jailbreaking against Large Language Models
☆25Oct 7, 2025Updated 9 months ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
AI45Lab / DeepScan
View on GitHub
Diagnostic Framework for LLMs and MLLMs
☆39Mar 2, 2026Updated 4 months ago
qizhangli / Gradient-based-Jailbreak-Attacks
View on GitHub
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Nov 7, 2024Updated last year
LLM-Tuning-Safety / LLMs-Finetuning-Safety
View on GitHub
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆358Feb 23, 2024Updated 2 years ago
HanjiangHu / NBF-LLM
View on GitHub
The official code for "Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks".
☆18Jun 24, 2026Updated 3 weeks ago
zhipeng-wei / EmojiAttack
View on GitHub
Emoji Attack [ICML 2025]
☆45Jul 15, 2025Updated last year
yuki-younai / MTSA
View on GitHub
offical implementation of MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
☆16Jun 2, 2025Updated last year
AI45Lab / REEF
View on GitHub
The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…
☆79Jan 16, 2025Updated last year