AI-secure / AgentPoisonLinks

[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"

☆160

Alternatives and similar repositories for AgentPoison

Users that are interested in AgentPoison are comparing it to the libraries listed below

Sorting:

lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆93Updated last year
agiresearch / ASB
Agent Security Bench (ASB)
☆137Updated 3 weeks ago
ChenWu98 / agent-attack
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
☆110Updated 8 months ago
usail-hkust / JailTrickBench
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆152Updated 11 months ago
uiuc-kang-lab / InjecAgent
☆83Updated last year
pasquini-dario / LLM_NeuralExec
Code to generate NeuralExecs (prompt injection for LLMs)
☆25Updated 3 weeks ago
BHui97 / PLeak
☆65Updated 10 months ago
SheltonLiu-N / Universal-Prompt-Injection
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆60Updated last year
AI45Lab / ActorAttack
☆108Updated 8 months ago
AI-secure / RedCode
[NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents
☆52Updated 3 months ago
CryptoAILab / JailbreakEval
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆172Updated 6 months ago
GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆102Updated 9 months ago
uw-nsl / SafeDecoding
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆146Updated last year
facebookresearch / SecAlign
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆72Updated 3 months ago
OSU-NLP-Group / AgentSafety
☆119Updated 5 months ago
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆209Updated 8 months ago
facebookresearch / advprompter
Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873
☆169Updated last year
YihanWang617 / llm-jailbreaking-defense
A lightweight library for large laguage model (LLM) jailbreaking defense.
☆58Updated last month
OSU-NLP-Group / AmpleGCG
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆75Updated 11 months ago
Sizhe-Chen / StruQ
official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries
☆48Updated 3 months ago
Allen-piexl / JailbreakZoo
☆153Updated last year
thu-coai / Agent-SafetyBench
☆69Updated 2 months ago
RICommunity / TAP
TAP: An automated jailbreaking method for black-box LLMs
☆194Updated 10 months ago
WhileBug / AwesomeLLMJailBreakPapers
Awesome LLM Jailbreak academic papers
☆111Updated last year
liu00222 / Open-Prompt-Injection
This repository provides a benchmark for prompt injection attacks and defenses
☆310Updated 2 weeks ago
OSU-NLP-Group / EIA_against_webagent
☆35Updated last year
HKUST-KnowComp / LLM-Multistep-Jailbreak
Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
☆35Updated 2 years ago
chen37058 / Red-Team-Arxiv-Paper-Update
Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)
☆70Updated this week
Lyz1213 / BadEdit
☆36Updated last year
SheltonLiu-N / AutoDAN
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆388Updated 9 months ago