Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆28Jul 6, 2024Updated last year
Alternatives and similar repositories for virtual-prompt-injection
Users that are interested in virtual-prompt-injection are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Working Memory Attack on LLMs☆18May 27, 2025Updated 11 months ago
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models☆61Apr 8, 2024Updated 2 years ago
- Code for the paper "Exploring Backdoor Vulnerabilities of Chat Models"☆19Apr 13, 2024Updated 2 years ago
- ☆38Oct 17, 2024Updated last year
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆15Jun 21, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ICL backdoor attack☆17Nov 4, 2024Updated last year
- ☆15Jul 8, 2023Updated 2 years ago
- ☆59May 30, 2024Updated last year
- ☆22Sep 2, 2025Updated 8 months ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆112Sep 27, 2024Updated last year
- ☆18Jul 1, 2021Updated 4 years ago
- [USENIX Security'24] REMARK-LLM: A robust and efficient watermarking framework for generative large language models☆28Oct 23, 2024Updated last year
- [CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.☆30Jul 29, 2024Updated last year
- ☆600Jul 4, 2025Updated 10 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆29Aug 21, 2023Updated 2 years ago
- Code for paper: "RemovalNet: DNN model fingerprinting removal attack", IEEE TDSC 2023.☆10Nov 27, 2023Updated 2 years ago
- Code and dataset for the paper: "Can Editing LLMs Inject Harm?" [AAAI'26]☆21Dec 26, 2025Updated 4 months ago
- Security Attacks on LLM-based Code Completion Tools (AAAI 2025)☆22Dec 31, 2025Updated 4 months ago
- Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models …☆15Sep 12, 2025Updated 7 months ago
- Watermarking LLM papers up-to-date☆12Dec 17, 2023Updated 2 years ago
- BrainWash: A Poisoning Attack to Forget in Continual Learning☆12Apr 15, 2024Updated 2 years ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆152Jul 19, 2024Updated last year
- Re-thinking Federated Active Learning based on Inter-class Diversity (CVPR 2023)☆31May 31, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆21Aug 10, 2024Updated last year
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆13Jan 14, 2026Updated 3 months ago
- ☆51Oct 23, 2023Updated 2 years ago
- Code and datasets for the salesforce AI research paper on prompt leakage and multi-turn threats against LLMs☆22Nov 10, 2025Updated 5 months ago
- [ICML 2023] Protecting Language Generation Models via Invisible Watermarking☆13Sep 8, 2023Updated 2 years ago
- Python code to automatically produce a summary of a piece of text.☆12Sep 8, 2016Updated 9 years ago
- ☆10Jun 12, 2023Updated 2 years ago
- ☆11Apr 17, 2023Updated 3 years ago
- A python implementation of the concepts in the book "Reinforcement Learning: An Introduction" by R.S. Sutton and A. G. Barto.☆21Jul 13, 2020Updated 5 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆17Nov 8, 2024Updated last year
- This is AlpaGasus2-QLoRA based on LLaMA2 with AlpaGasus mechanism using QLoRA!☆15Nov 22, 2023Updated 2 years ago
- Target Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning☆10Jul 2, 2019Updated 6 years ago
- Our Sentimental LIAR dataset is a modified and further extended version of the LIAR extension introduced by Kirilin et al. In our dataset…☆16Mar 31, 2022Updated 4 years ago
- ☆19Mar 26, 2022Updated 4 years ago
- Reinforcing General Reasoning without Verifiers☆99Jun 24, 2025Updated 10 months ago
- State-Relabeling Adversarial Active Learning☆14Aug 17, 2021Updated 4 years ago