theshi-1128 / ABJ-AttackLinks

LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models

☆21

Alternatives and similar repositories for ABJ-Attack

Users that are interested in ABJ-Attack are comparing it to the libraries listed below

Sorting:

uiuc-kang-lab / InjecAgent
☆71Updated last year
usail-hkust / JailTrickBench
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆144Updated 9 months ago
Yu-Fangxu / COLD-Attack
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆166Updated 8 months ago
BHui97 / PLeak
☆61Updated 8 months ago
xirui-li / DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆58Updated last year
wagner-group / prompt-injection-defense
Fine-tuning base models to build robust task-specific models
☆32Updated last year
NJUNLP / ReNeLLM
The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…
☆128Updated this week
vinusankars / BEAST
Implementation of BEAST adversarial attack for language models (ICML 2024)
☆91Updated last year
Aatrox103 / SAP
☆45Updated last year
AI-secure / AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆141Updated 4 months ago
ThuCCSLab / JailbreakEval
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆167Updated 5 months ago
eurekayuan / RigorLLM
Implementation for "RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content"
☆22Updated last year
SheltonLiu-N / Universal-Prompt-Injection
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆54Updated 10 months ago
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆89Updated 11 months ago
AI45Lab / ActorAttack
☆101Updated 6 months ago
agiresearch / ASB
Agent Security Bench (ASB)
☆108Updated 2 months ago
uw-nsl / ArtPrompt
[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
☆80Updated 2 weeks ago
Django-Jiang / BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆37Updated last year
AIM-Intelligence / Automated-Multi-Turn-Jailbreaks
☆85Updated 9 months ago
OSU-NLP-Group / AgentSafety
☆106Updated 3 months ago
rucnyz / LeakAgent
☆20Updated this week
YancyKahn / CoA
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆35Updated 7 months ago
theshi-1128 / jailbreak-bench
The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆21Updated 5 months ago
ThuCCSLab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆167Updated 2 months ago
GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆96Updated 7 months ago
chen37058 / Red-Team-Arxiv-Paper-Update
Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)
☆53Updated this week
SproutNan / AI-Safety_Benchmark
The official repository for guided jailbreak benchmark
☆18Updated last month
HKUST-KnowComp / LLM-Multistep-Jailbreak
Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
☆34Updated last year
chawins / pal
PAL: Proxy-Guided Black-Box Attack on Large Language Models
☆54Updated last year
tmlr-group / DeepInception
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"
☆161Updated last year