lancopku/agent-backdoor-attacks

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lancopku/agent-backdoor-attacks)

lancopku / agent-backdoor-attacks

Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]

☆116

Alternatives and similar repositories for agent-backdoor-attacks

Users that are interested in agent-backdoor-attacks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DPamK / BadAgent
View on GitHub
☆33Feb 27, 2025Updated last year
bboylyg / BackdoorLLM
View on GitHub
[NeurIPS 2025] BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
☆316Mar 13, 2026Updated 4 months ago
AI-secure / AgentPoison
View on GitHub
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆231Jun 17, 2026Updated last month
agiresearch / ASB
View on GitHub
Agent Security Bench (ASB)
☆273Apr 16, 2026Updated 3 months ago
ZhangZhuoSJTU / LINT
View on GitHub
☆17Sep 4, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
PurduePAML / PICCOLO
View on GitHub
☆26Dec 1, 2022Updated 3 years ago
lapisrocks / rpo
View on GitHub
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆62Aug 8, 2024Updated last year
AI-secure / Robustness-Against-Backdoor-Attacks
View on GitHub
RAB: Provable Robustness Against Backdoor Attacks
☆40Oct 3, 2023Updated 2 years ago
ethz-spylab / agentdojo
View on GitHub
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
☆680Jun 2, 2026Updated last month
RPC2 / AutoInject
View on GitHub
☆20Jun 12, 2026Updated last month
shaoshuo-ss / EaaW
View on GitHub
[NDSS 2025] Official code for our paper "Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Wate…
☆45Nov 5, 2024Updated last year
uiuc-kang-lab / InjecAgent
View on GitHub
☆153Jul 2, 2024Updated 2 years ago
uw-nsl / CleanGen
View on GitHub
[EMNLP 24] Official Implementation of CLEANGEN: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
☆19Mar 9, 2025Updated last year
Yunhao-Feng / BackdoorAgent
View on GitHub
BackdoorAgent is a stage-aware framework and benchmark that instruments LLM-agent workflows (planning, memory, tools) to systematically i…
☆42Mar 16, 2026Updated 4 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
agiresearch / TrustAgent
View on GitHub
TrustAgent: Towards Safe and Trustworthy LLM-based Agents
☆58Feb 7, 2025Updated last year
whfeLingYu / DemonAgent
View on GitHub
☆18Apr 1, 2025Updated last year
ChenWu98 / agent-attack
View on GitHub
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
☆140Feb 19, 2025Updated last year
T1aNS1R / Evil-Geniuses
View on GitHub
☆71Feb 4, 2024Updated 2 years ago
uiuc-kang-lab / AdaptiveAttackAgent
View on GitHub
☆39Mar 12, 2025Updated last year
facebookresearch / SecAlign
View on GitHub
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆98Jul 2, 2026Updated 3 weeks ago
wegodev2 / virtual-prompt-injection
View on GitHub
Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆27Jul 6, 2024Updated 2 years ago
OSU-NLP-Group / EIA_against_webagent
View on GitHub
☆40Oct 2, 2024Updated last year
SolidShen / BAIT
View on GitHub
🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access
☆57Jun 2, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AI-secure / AdvAgent
View on GitHub
☆25May 28, 2025Updated last year
lishaofeng / NLP_Backdoor
View on GitHub
Hidden backdoor attack on NLP systems
☆45Nov 14, 2021Updated 4 years ago
lancopku / SOS
View on GitHub
Code for the paper "Rethinking Stealthiness of Backdoor Attack against NLP Models" (ACL-IJCNLP 2021)
☆24Dec 9, 2021Updated 4 years ago
papersPapers / BadPrompt
View on GitHub
Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"
☆41Jul 8, 2024Updated 2 years ago
SCLBD / BackdoorBench
View on GitHub
☆617Jul 4, 2025Updated last year
RUCAIBox / HADES
View on GitHub
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆39Oct 23, 2024Updated last year
S3IC-Lab / Odysseus
View on GitHub
[NDSS 2026] Official repo for Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography
☆59Mar 14, 2026Updated 4 months ago
OSU-NLP-Group / AgentSafety
View on GitHub
☆192Oct 31, 2025Updated 8 months ago
MiracleHH / CBA
View on GitHub
Composite Backdoor Attacks Against Large Language Models
☆25Apr 12, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Django-Jiang / BadChain
View on GitHub
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆56Jul 24, 2024Updated 2 years ago
lancopku / Embedding-Poisoning
View on GitHub
Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-…
☆44Jul 26, 2021Updated 4 years ago
bboylyg / RNP
View on GitHub
Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)
☆40Dec 24, 2023Updated 2 years ago
aisa-group / skill-inject
View on GitHub
Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
☆88Jul 1, 2026Updated 3 weeks ago
sleeepeer / PoisonedRAG
View on GitHub
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆286Jan 27, 2026Updated 5 months ago
AI4Good24 / PsySafe
View on GitHub
☆53Feb 8, 2025Updated last year
liu00222 / Open-Prompt-Injection
View on GitHub
This repository provides a benchmark for prompt injection attacks and defenses in LLMs
☆467Oct 29, 2025Updated 8 months ago