google-research-datasets / adversarial-nibbler
This dataset contains results from all rounds of Adversarial Nibbler. This data includes adversarial prompts fed into public generative text2image models and validations for unsafe images. There will be two sets of data: all prompts submitted and all prompts attempted (sent to t2i models but not submitted as unsafe).
☆16Updated 3 months ago
Related projects: ⓘ
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆56Updated 7 months ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆27Updated 5 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆45Updated last month
- ☆30Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆59Updated 6 months ago
- Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"☆32Updated 2 months ago
- Repo for arXiv preprint "Gradient-based Adversarial Attacks against Text Transformers"☆98Updated last year
- ☆37Updated 10 months ago
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022☆26Updated 2 years ago
- Code for paper: "Spinning Language Models: Risks of Propaganda-as-a-Service and Countermeasures"☆21Updated 2 years ago
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆48Updated last week
- ☆19Updated 7 months ago
- ☆38Updated last year
- ☆42Updated last year
- ☆59Updated 11 months ago
- ☆16Updated 10 months ago
- SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors☆30Updated 2 months ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents"☆29Updated 3 months ago
- ☆47Updated last year
- Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-…☆34Updated 3 years ago
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models☆39Updated 5 months ago
- ☆53Updated last year
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆39Updated 4 months ago
- [Arxiv 2024] Adversarial attacks on multimodal agents☆33Updated 2 months ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆89Updated 2 months ago
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆11Updated 4 months ago
- ☆12Updated 4 months ago
- Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.☆40Updated 10 months ago
- ☆18Updated last week
- ☆30Updated last year