Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆27Jul 6, 2024Updated last year
Alternatives and similar repositories for virtual-prompt-injection
Users that are interested in virtual-prompt-injection are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Working Memory Attack on LLMs☆17May 27, 2025Updated 10 months ago
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models☆61Apr 8, 2024Updated 2 years ago
- Code for the paper "Exploring Backdoor Vulnerabilities of Chat Models"☆19Apr 13, 2024Updated 2 years ago
- ☆11Oct 3, 2021Updated 4 years ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆15Jun 21, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Implementation of BadCLIP https://arxiv.org/pdf/2311.16194.pdf☆24Mar 23, 2024Updated 2 years ago
- ICL backdoor attack☆17Nov 4, 2024Updated last year
- ☆15Jul 8, 2023Updated 2 years ago
- ☆59May 30, 2024Updated last year
- ☆32Sep 3, 2024Updated last year
- ☆22Sep 2, 2025Updated 7 months ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆112Sep 27, 2024Updated last year
- Evaluating Durability: Benchmark Insights into Multimodal Watermarking☆12Jun 7, 2024Updated last year
- [CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.☆30Jul 29, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆595Jul 4, 2025Updated 9 months ago
- Code for paper: "RemovalNet: DNN model fingerprinting removal attack", IEEE TDSC 2023.☆10Nov 27, 2023Updated 2 years ago
- Code and dataset for the paper: "Can Editing LLMs Inject Harm?"☆21Dec 26, 2025Updated 3 months ago
- Security Attacks on LLM-based Code Completion Tools (AAAI 2025)☆22Dec 31, 2025Updated 3 months ago
- Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models …☆15Sep 12, 2025Updated 7 months ago
- Watermarking LLM papers up-to-date☆11Dec 17, 2023Updated 2 years ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆152Jul 19, 2024Updated last year
- ☆14May 22, 2017Updated 8 years ago
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆21Aug 10, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆13Jan 14, 2026Updated 3 months ago
- ☆52Oct 23, 2023Updated 2 years ago
- [ICML 2023] Protecting Language Generation Models via Invisible Watermarking☆13Sep 8, 2023Updated 2 years ago
- Python code to automatically produce a summary of a piece of text.☆12Sep 8, 2016Updated 9 years ago
- ☆11Apr 17, 2023Updated 2 years ago
- ☆17Nov 8, 2024Updated last year
- This is AlpaGasus2-QLoRA based on LLaMA2 with AlpaGasus mechanism using QLoRA!☆15Nov 22, 2023Updated 2 years ago
- Target Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning☆10Jul 2, 2019Updated 6 years ago
- 【Join our constellation of stargazers!⭐️】An interactive AI-powered story generator that creates dynamic narratives through collaborative …☆13Updated this week
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆19Mar 26, 2022Updated 4 years ago
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆65Apr 24, 2024Updated last year
- ☆18Oct 7, 2022Updated 3 years ago
- Reinforcing General Reasoning without Verifiers☆97Jun 24, 2025Updated 9 months ago
- State-Relabeling Adversarial Active Learning☆14Aug 17, 2021Updated 4 years ago
- [EMNLP 2022] Distillation-Resistant Watermarking (DRW) for Model Protection in NLP☆13Aug 17, 2023Updated 2 years ago
- Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!☆11Oct 16, 2024Updated last year