Phantivia / T-PGD
[Findings of ACL 2023] Bridge the Gap Between CV and NLP! A Optimization-based Textual Adversarial Attack Framework.
☆12Updated last year
Related projects: ⓘ
- Improved techniques for optimization-based jailbreaking on large language models☆33Updated 3 months ago
- ☆14Updated 2 months ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents"☆29Updated 3 months ago
- Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"☆11Updated 2 months ago
- "In-Context Unlearning: Language Models as Few Shot Unlearners". Martin Pawelczyk, Seth Neel* and Himabindu Lakkaraju*; ICML 2024.☆13Updated 11 months ago
- Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"☆32Updated 2 months ago
- ☆32Updated 11 months ago
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022☆26Updated 2 years ago
- A curated list of trustworthy Generative AI papers. Daily updating...☆67Updated 2 weeks ago
- Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"☆37Updated 2 years ago
- Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…☆32Updated last year
- Official repo for paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"☆13Updated 4 months ago
- ☆37Updated 10 months ago
- codes for "Searching for an Effective Defender:Benchmarking Defense against Adversarial Word Substitution"☆30Updated 10 months ago
- ☆19Updated 7 months ago
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models☆39Updated 5 months ago
- ☆21Updated 2 months ago
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆64Updated 2 weeks ago
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆39Updated 4 months ago
- An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)☆27Updated 7 months ago
- Repo for arXiv preprint "Gradient-based Adversarial Attacks against Text Transformers"☆98Updated last year
- ☆12Updated 4 months ago
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆26Updated last month
- Data for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder"☆17Updated 10 months ago
- [ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion☆14Updated this week
- Code for the paper "Rethinking Stealthiness of Backdoor Attack against NLP Models" (ACL-IJCNLP 2021)☆21Updated 2 years ago
- ☆63Updated 10 months ago
- Official code implementation of SKU, Accepted by ACL 2024 Findings☆11Updated 4 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NextGenAISafety @ ICML 2024)☆37Updated last month
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆79Updated 3 months ago