Aatrox103 / SAPLinks

☆45

Alternatives and similar repositories for SAP

Users that are interested in SAP are comparing it to the libraries listed below

Sorting:

GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆94Updated 6 months ago
DAMO-NLP-SG / multilingual-safety-for-LLMs
[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"
☆79Updated last year
declare-lab / red-instruct
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
☆103Updated last year
Yu-Fangxu / COLD-Attack
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆160Updated 7 months ago
SheltonLiu-N / Universal-Prompt-Injection
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆51Updated 9 months ago
HKUST-KnowComp / LLM-Multistep-Jailbreak
Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
☆34Updated last year
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆84Updated 10 months ago
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆106Updated 11 months ago
YihanWang617 / LLM-Jailbreaking-Defense-Backtranslation
Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"
☆30Updated 11 months ago
uw-nsl / SafeDecoding
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆140Updated last year
Princeton-SysML / Jailbreak_LLM
☆178Updated last year
vinid / safety-tuned-llamas
ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.
☆85Updated last year
chujiezheng / LLM-Safeguard
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆94Updated 2 months ago
liuchengyuan123 / CPAD
The official dataset of paper "Goal-Oriented Prompt Attack and Safety Evaluation for LLMs".
☆18Updated last year
rotaryhammer / code-autodan
An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)
☆42Updated last year
OSU-NLP-Group / AmpleGCG
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆69Updated 9 months ago
huizhang-L / CodeChameleon
☆22Updated last year
uiuc-kang-lab / InjecAgent
☆70Updated last year
ThuCCSLab / JailbreakEval
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆165Updated 4 months ago
wang2226 / Trojan-Activation-Attack
[CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.
☆25Updated last year
usail-hkust / JailTrickBench
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆144Updated 8 months ago
YihanWang617 / llm-jailbreaking-defense
A lightweight library for large laguage model (LLM) jailbreaking defense.
☆54Updated 9 months ago
OSU-NLP-Group / AgentSafety
☆99Updated 3 months ago
wagner-group / prompt-injection-defense
Fine-tuning base models to build robust task-specific models
☆31Updated last year
xirui-li / DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆56Updated 11 months ago
SheltonLiu-N / AutoDAN
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆367Updated 6 months ago
facebookresearch / advprompter
Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873
☆160Updated last year
thunlp / Advbench
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…
☆53Updated 2 years ago
thu-coai / JailbreakDefense_GoalPriority
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆26Updated last year
AI45Lab / ActorAttack
☆96Updated 6 months ago