DPamK/BadAgent

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DPamK/BadAgent)

DPamK / BadAgent

☆33

Alternatives and similar repositories for BadAgent

Users that are interested in BadAgent are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lancopku / agent-backdoor-attacks
View on GitHub
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆115Sep 27, 2024Updated last year
xzhou98 / GBTL-attack
View on GitHub
☆18Jun 4, 2025Updated last year
uiuc-kang-lab / AdaptiveAttackAgent
View on GitHub
☆39Mar 12, 2025Updated last year
whfeLingYu / DemonAgent
View on GitHub
☆18Apr 1, 2025Updated last year
Yunhao-Feng / BackdoorAgent
View on GitHub
BackdoorAgent is a stage-aware framework and benchmark that instruments LLM-agent workflows (planning, memory, tools) to systematically i…
☆42Mar 16, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
uw-nsl / CleanGen
View on GitHub
[EMNLP 24] Official Implementation of CLEANGEN: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
☆19Mar 9, 2025Updated last year
Vincent-HKUSTGZ / PEFTGuard
View on GitHub
Official repository for PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning, accepted at 2025 IEEE Symposium on…
☆18Jul 4, 2025Updated last year
shaoshuo-ss / EaaW
View on GitHub
[NDSS 2025] Official code for our paper "Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Wate…
☆45Nov 5, 2024Updated last year
uiuc-kang-lab / InjecAgent
View on GitHub
☆152Jul 2, 2024Updated 2 years ago
TanqiuJiang / AgentLAB
View on GitHub
The official implementation of the paper "AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks"
☆26Jun 1, 2026Updated last month
OSU-NLP-Group / AgentAttack
View on GitHub
☆22Oct 25, 2024Updated last year
lapisrocks / rpo
View on GitHub
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆62Aug 8, 2024Updated last year
datasec-lab / CodeBreaker
View on GitHub
[USENIX Security '24] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities agai…
☆60Mar 22, 2025Updated last year
Gwinhen / MOTH
View on GitHub
This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…
☆11Aug 24, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
lancopku / Embedding-Poisoning
View on GitHub
Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-…
☆45Jul 26, 2021Updated 4 years ago
arpitbansal297 / Certified_Watermarks
View on GitHub
☆16Jul 17, 2022Updated 4 years ago
bigglesworthnotacat / LLM-Steg
View on GitHub
[ICLR 2026 Oral] Invisible Safety Threat: Malicious Finetuning for LLM via Steganography
☆20Mar 22, 2026Updated 3 months ago
Tele-EVOL / TeleAI-Safety
View on GitHub
☆27Jan 5, 2026Updated 6 months ago
xjzzzzzzzz / MCPSafety
View on GitHub
☆22Dec 18, 2025Updated 7 months ago
aisa-group / skill-inject
View on GitHub
Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks
☆88Jul 1, 2026Updated 2 weeks ago
UMBCvision / Universal-Litmus-Patterns
View on GitHub
Official Repository for the CVPR 2020 paper "Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs"
☆45Oct 24, 2023Updated 2 years ago
Greysahy / ipiguard
View on GitHub
[EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents
☆22Sep 16, 2025Updated 10 months ago
SewoongLab / spectre-defense
View on GitHub
Defending Against Backdoor Attacks Using Robust Covariance Estimation
☆22Jul 12, 2021Updated 5 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
SaFo-Lab / DRIFT
View on GitHub
[NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agen…
☆58Updated this week
PurduePAML / Exray
View on GitHub
☆12May 27, 2022Updated 4 years ago
THU-BPM / Watermark-Radioactivity-Attack
View on GitHub
[ACL 2025 Main] Code and data for paper "Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?"
☆23Jun 18, 2025Updated last year
reds-lab / BEEAR
View on GitHub
This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…
☆23Jul 3, 2024Updated 2 years ago
Wangyuhao06 / IKEA
View on GitHub
Implement of Implicit Knowledge Extraction Attack.
☆24Jul 14, 2026Updated last week
shaoshuo-ss / FedTracker
View on GitHub
[TDSC 2024] Official code for our paper "FedTracker: Furnishing Ownership Verification and Traceability for Federated Learning Model"
☆23May 14, 2025Updated last year
wang2226 / Trojan-Activation-Attack
View on GitHub
[CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.
☆30Jul 29, 2024Updated last year
InvokerStark / OverKill
View on GitHub
☆15Jun 13, 2024Updated 2 years ago
facebookresearch / SecAlign
View on GitHub
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆98Jul 2, 2026Updated 2 weeks ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
eth-sri / smoothing-ensembles
View on GitHub
[ICLR 2022] Boosting Randomized Smoothing with Variance Reduced Classifiers
☆11Mar 29, 2022Updated 4 years ago
dunzeng / FedAWARE
View on GitHub
Code for AISTATS'25 paper - On the Power of Adaptive Weighted Aggregation in Heterogeneous Federated Learning and Beyond
☆14Sep 23, 2025Updated 9 months ago
clearloveclearlove / BEAT
View on GitHub
☆15Feb 26, 2025Updated last year
RPC2 / AutoInject
View on GitHub
☆20Jun 12, 2026Updated last month
Kim-Minseon / APGP
View on GitHub
Automatic Jailbreaking of the Text-to-Image Generative AI Systems
☆15Jun 23, 2024Updated 2 years ago
dongsenzhang / MSB
View on GitHub
☆38Mar 24, 2026Updated 3 months ago
alevine0 / DPA
View on GitHub
Code for the paper "Deep Partition Aggregation: Provable Defenses against General Poisoning Attacks"
☆14Aug 22, 2022Updated 3 years ago