Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆27Jul 6, 2024Updated last year
Alternatives and similar repositories for virtual-prompt-injection
Users that are interested in virtual-prompt-injection are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Working Memory Attack on LLMs☆18May 27, 2025Updated last year
- Code for the paper "Exploring Backdoor Vulnerabilities of Chat Models"☆19Apr 13, 2024Updated 2 years ago
- ☆11Oct 3, 2021Updated 4 years ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆15Jun 21, 2024Updated last year
- ICL backdoor attack☆17Nov 4, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆15Jul 8, 2023Updated 2 years ago
- ☆59May 30, 2024Updated 2 years ago
- ☆22Sep 2, 2025Updated 9 months ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆112Sep 27, 2024Updated last year
- [USENIX Security'24] REMARK-LLM: A robust and efficient watermarking framework for generative large language models☆28Oct 23, 2024Updated last year
- [CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.☆30Jul 29, 2024Updated last year
- ☆34Aug 11, 2022Updated 3 years ago
- ☆13Sep 8, 2024Updated last year
- ☆607Jul 4, 2025Updated 11 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Code for paper: "RemovalNet: DNN model fingerprinting removal attack", IEEE TDSC 2023.☆10Nov 27, 2023Updated 2 years ago
- Code and dataset for the paper: "Can Editing LLMs Inject Harm?" [AAAI'26]☆21Dec 26, 2025Updated 5 months ago
- Security Attacks on LLM-based Code Completion Tools (AAAI 2025)☆23Dec 31, 2025Updated 5 months ago
- TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes☆14Jul 1, 2025Updated 11 months ago
- RevLLM -- Reverse Engineering Tools for Large Language Models☆22Feb 29, 2024Updated 2 years ago
- BrainWash: A Poisoning Attack to Forget in Continual Learning☆12Apr 15, 2024Updated 2 years ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆153Jul 19, 2024Updated last year
- ☆14May 22, 2017Updated 9 years ago
- Re-thinking Federated Active Learning based on Inter-class Diversity (CVPR 2023)☆31May 31, 2023Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆21Aug 10, 2024Updated last year
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆13Jan 14, 2026Updated 5 months ago
- ☆54Oct 23, 2023Updated 2 years ago
- Code and datasets for the salesforce AI research paper on prompt leakage and multi-turn threats against LLMs☆22Jun 2, 2026Updated last week
- [ICML 2023] Protecting Language Generation Models via Invisible Watermarking☆13Sep 8, 2023Updated 2 years ago
- ☆11Apr 17, 2023Updated 3 years ago
- ☆18Nov 8, 2024Updated last year
- A python implementation of the concepts in the book "Reinforcement Learning: An Introduction" by R.S. Sutton and A. G. Barto.☆21Jul 13, 2020Updated 5 years ago
- This is AlpaGasus2-QLoRA based on LLaMA2 with AlpaGasus mechanism using QLoRA!☆15Nov 22, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 【Join our constellation of stargazers!⭐️】An interactive AI-powered story generator that creates dynamic narratives through collaborative …☆13Jun 5, 2026Updated last week
- ☆19Mar 26, 2022Updated 4 years ago
- ☆18Oct 7, 2022Updated 3 years ago
- Reinforcing General Reasoning without Verifiers☆100Jun 24, 2025Updated 11 months ago
- Jekyll theme for displaying a resume/cv in a clean, minimallistic way.☆10Jan 4, 2021Updated 5 years ago
- [EMNLP 2022] Distillation-Resistant Watermarking (DRW) for Model Protection in NLP☆13Aug 17, 2023Updated 2 years ago
- ☆11Aug 15, 2023Updated 2 years ago