zhangrui4041/Instruction_Backdoor_Attack

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhangrui4041/Instruction_Backdoor_Attack)

zhangrui4041 / Instruction_Backdoor_Attack

☆26

Alternatives and similar repositories for Instruction_Backdoor_Attack

Users that are interested in Instruction_Backdoor_Attack are comparing it to the libraries listed below

Sorting:

MiracleHH / CBA
View on GitHub
Composite Backdoor Attacks Against Large Language Models
☆22Apr 12, 2024Updated last year
Gwinhen / MOTH
View on GitHub
This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…
☆11Aug 24, 2022Updated 3 years ago
leix28 / prompt-universal-vulnerability
View on GitHub
Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022
☆32Jul 11, 2022Updated 3 years ago
datasec-lab / CodeBreaker
View on GitHub
[USENIX Security '24] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities agai…
☆57Mar 22, 2025Updated 11 months ago
UNHSAILLab / working-memory-attack-on-llms
View on GitHub
Working Memory Attack on LLMs
☆17May 27, 2025Updated 9 months ago
shuaizhao95 / ICLAttack
View on GitHub
ICL backdoor attack
☆17Nov 4, 2024Updated last year
TDteach / Demon-in-the-Variant
View on GitHub
☆13Oct 21, 2021Updated 4 years ago
xingpz2008 / TriFSS
View on GitHub
Efficient Secure Computation Protocols for Trigonometric Functions via Function Secret Sharing
☆20Nov 8, 2022Updated 3 years ago
xingpz2008 / dealerless-FSS_public
View on GitHub
Implementation of Distributed function secret sharing and applications
☆16May 10, 2025Updated 9 months ago
AI-secure / TextGuard
View on GitHub
TextGuard: Provable Defense against Backdoor Attacks on Text Classification
☆14Nov 7, 2023Updated 2 years ago
reds-lab / BEEAR
View on GitHub
This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…
☆22Jul 3, 2024Updated last year
thestephencasper / latent_adversarial_training
View on GitHub
☆24Jul 25, 2024Updated last year
Huiying-Li / blacklight
View on GitHub
☆20Jun 24, 2022Updated 3 years ago
chichidd / llm-lora-trojan
View on GitHub
Code for paper "The Philosopher’s Stone: Trojaning Plugins of Large Language Models"
☆27Sep 11, 2024Updated last year
lancopku / agent-backdoor-attacks
View on GitHub
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆109Sep 27, 2024Updated last year
thunlp / OpenBackdoor
View on GitHub
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
☆200Apr 10, 2023Updated 2 years ago
lancopku / RAP
View on GitHub
Code for the paper "RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models" (EMNLP 2021)
☆25Oct 21, 2021Updated 4 years ago
lancopku / SOS
View on GitHub
Code for the paper "Rethinking Stealthiness of Backdoor Attack against NLP Models" (ACL-IJCNLP 2021)
☆24Dec 9, 2021Updated 4 years ago
meng-wenlong / LMSanitator
View on GitHub
☆28Aug 21, 2023Updated 2 years ago
EvanXiaa / Awesome-LLM_For-SE-Sec-Papers
View on GitHub
☆31Sep 22, 2024Updated last year
WUSTL-CSPL / LLMJailbreak
View on GitHub
☆37Sep 30, 2024Updated last year
BHui97 / PrivateFL
View on GitHub
☆46Aug 4, 2023Updated 2 years ago
UCF-Lou-Lab-PET / Private-Data-Prune
View on GitHub
☆16Nov 8, 2024Updated last year
cnut1648 / Model-Fingerprint
View on GitHub
Fingerprint large language models
☆49Jul 11, 2024Updated last year
t-maho / SurFree
View on GitHub
SurFree: a fast surrogate-free black-box attack
☆44Jun 27, 2024Updated last year
VulDet / FVD-DPM
View on GitHub
A deep learning model for identifying and localizing vulnerabilities in C/C++ source code.
☆12Jan 18, 2025Updated last year
RU-System-Software-and-Security / NONE
View on GitHub
☆10Oct 31, 2022Updated 3 years ago
Daniel-Ayz / CHeaT
View on GitHub
Cloak, Honey, Trap: Proactive Defenses Against LLM Agents
☆16Jul 9, 2025Updated 8 months ago
intellisec / xai-backdoors
View on GitHub
Disguising Attacks with Explanation-Aware Backdoors (IEEE S&P 2023)
☆12Jan 3, 2026Updated 2 months ago
HyeonjeongHa / MM-PoisonRAG
View on GitHub
Official PyTorch implementation of "MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks"
☆12Dec 4, 2025Updated 3 months ago
CryptoAPI-Bench / CryptoAPI-Bench
View on GitHub
☆11Jan 10, 2024Updated 2 years ago
UACalc / uacalcsrc
View on GitHub
The Universal Algebra Calculator
☆16Jun 11, 2022Updated 3 years ago
KatherLab / prompt_injection_attacks
View on GitHub
☆13Dec 28, 2024Updated last year
Golevka / reviz
View on GitHub
Visualize NFA and DFA constructed from regular exrepssion
☆18Feb 10, 2017Updated 9 years ago
DependableSystemsLab / MIA_defense_HAMP
View on GitHub
Code for the paper "Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction" …
☆12Sep 6, 2023Updated 2 years ago
piotrjurkiewicz / topohub
View on GitHub
Repository of reference Gabriel graph, Internet Topology Zoo, SNDlib, CAIDA and synthetic backbone topologies for networking research
☆12Sep 30, 2025Updated 5 months ago
ZiangYan / pda.pytorch
View on GitHub
Implementation of our ICLR 2021 paper: Policy-Driven Attack: Learning to Query for Hard-label Black-box Adversarial Examples.
☆11Mar 9, 2021Updated 5 years ago
Maryeon / whiten_mtd
View on GitHub
Official repository of paper "Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval"
☆10Dec 20, 2023Updated 2 years ago
fzwark / Secure_LLM_System
View on GitHub
☆14Mar 9, 2025Updated last year