bboylyg / BackdoorLLMLinks

BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models

☆183

Alternatives and similar repositories for BackdoorLLM

Users that are interested in BackdoorLLM are comparing it to the libraries listed below

Sorting:

sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆174Updated 5 months ago
Lyz1213 / BadEdit
☆32Updated 9 months ago
Django-Jiang / BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆36Updated last year
BHui97 / PLeak
☆60Updated 7 months ago
RICommunity / TAP
TAP: An automated jailbreaking method for black-box LLMs
☆180Updated 7 months ago
SheltonLiu-N / AutoDAN
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆367Updated 6 months ago
ThuCCSLab / JailbreakEval
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆165Updated 4 months ago
AI-secure / AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆134Updated 3 months ago
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆84Updated 10 months ago
pasquini-dario / LLM_NeuralExec
Code to generate NeuralExecs (prompt injection for LLMs)
☆22Updated 8 months ago
AI45Lab / ActorAttack
☆96Updated 6 months ago
ThuCCSLab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆160Updated last month
JailbreakBench / jailbreakbench
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
☆379Updated 4 months ago
xingjunm / Awesome-Large-Model-Safety
Safety at Scale: A Comprehensive Survey of Large Model Safety
☆183Updated 5 months ago
WUSTL-CSPL / LLMJailbreak
☆35Updated 10 months ago
git-disl / awesome_LLM-harmful-fine-tuning-papers
A survey on harmful fine-tuning attack for large language model
☆196Updated this week
usail-hkust / JailTrickBench
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆144Updated 8 months ago
agiresearch / ASB
Agent Security Bench (ASB)
☆100Updated last month
GraySwanAI / nanoGCG
A fast + lightweight implementation of the GCG algorithm in PyTorch
☆263Updated 2 months ago
MiracleHH / CBA
Composite Backdoor Attacks Against Large Language Models
☆16Updated last year
chen37058 / Red-Team-Arxiv-Paper-Update
Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)
☆44Updated this week
zhangrui4041 / Instruction_Backdoor_Attack
☆25Updated 11 months ago
phycholosogy / RAG-privacy
The code for paper "The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)", exploring the privacy risk o…
☆53Updated 6 months ago
mignonjia / TS_watermark
☆16Updated 2 months ago
NY1024 / Foundation-Model-Paper-Notes
☆58Updated 2 months ago
qingjiesjtu / USC
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
☆58Updated 7 months ago
LLMSecurity / MasterKey
MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks…
☆26Updated 10 months ago
chawins / llm-sp
Papers and resources related to the security and privacy of LLMs 🤖
☆520Updated last month
Trustworthy-AI-Group / Adversarial_Examples_Papers
A list of recent papers about adversarial learning
☆192Updated this week
penghui-yang / awesome-data-poisoning-and-backdoor-attacks
A curated list of papers & resources linked to data poisoning, backdoor attacks and defenses against them (no longer maintained)
☆266Updated 6 months ago