Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆27Jul 6, 2024Updated last year
Alternatives and similar repositories for virtual-prompt-injection
Users that are interested in virtual-prompt-injection are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Working Memory Attack on LLMs☆17May 27, 2025Updated 9 months ago
- Code for the paper "Exploring Backdoor Vulnerabilities of Chat Models"☆19Apr 13, 2024Updated last year
- ☆11Oct 3, 2021Updated 4 years ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆14Jun 21, 2024Updated last year
- ICL backdoor attack☆17Nov 4, 2024Updated last year
- ☆15Jul 8, 2023Updated 2 years ago
- ☆58May 30, 2024Updated last year
- ☆32Sep 3, 2024Updated last year
- ☆22Sep 2, 2025Updated 6 months ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆112Sep 27, 2024Updated last year
- [USENIX Security'24] REMARK-LLM: A robust and efficient watermarking framework for generative large language models☆27Oct 23, 2024Updated last year
- Evaluating Durability: Benchmark Insights into Multimodal Watermarking☆12Jun 7, 2024Updated last year
- [CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.☆29Jul 29, 2024Updated last year
- ☆34Aug 11, 2022Updated 3 years ago
- ☆592Jul 4, 2025Updated 8 months ago
- ☆13Sep 8, 2024Updated last year
- ☆28Aug 21, 2023Updated 2 years ago
- Code for paper: "RemovalNet: DNN model fingerprinting removal attack", IEEE TDSC 2023.☆10Nov 27, 2023Updated 2 years ago
- Code and dataset for the paper: "Can Editing LLMs Inject Harm?"☆21Dec 26, 2025Updated 2 months ago
- Security Attacks on LLM-based Code Completion Tools (AAAI 2025)☆21Dec 31, 2025Updated 2 months ago
- TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes☆14Jul 1, 2025Updated 8 months ago
- Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models …☆15Sep 12, 2025Updated 6 months ago
- Watermarking LLM papers up-to-date☆11Dec 17, 2023Updated 2 years ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆151Jul 19, 2024Updated last year
- ☆14May 22, 2017Updated 8 years ago
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆20Aug 10, 2024Updated last year
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆13Jan 14, 2026Updated 2 months ago
- ☆52Oct 23, 2023Updated 2 years ago
- [ICML 2023] Protecting Language Generation Models via Invisible Watermarking☆13Sep 8, 2023Updated 2 years ago
- ☆13Dec 28, 2024Updated last year
- ☆11Apr 17, 2023Updated 2 years ago
- A python implementation of the concepts in the book "Reinforcement Learning: An Introduction" by R.S. Sutton and A. G. Barto.☆21Jul 13, 2020Updated 5 years ago
- ☆16Nov 8, 2024Updated last year
- This is AlpaGasus2-QLoRA based on LLaMA2 with AlpaGasus mechanism using QLoRA!☆15Nov 22, 2023Updated 2 years ago
- Target Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning☆10Jul 2, 2019Updated 6 years ago
- ☆19Mar 26, 2022Updated 3 years ago
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆65Apr 24, 2024Updated last year
- ☆18Oct 7, 2022Updated 3 years ago
- 根据CVPR2023论文Zero-shot Noise2Noise,复现了一个零样本、无假设的图像去噪模型☆23Jan 5, 2024Updated 2 years ago