google-research-datasets / adversarial-nibblerLinks
This dataset contains results from all rounds of Adversarial Nibbler. This data includes adversarial prompts fed into public generative text2image models and validations for unsafe images. There will be two sets of data: all prompts submitted and all prompts attempted (sent to t2i models but not submitted as unsafe).
☆25Updated 11 months ago
Alternatives and similar repositories for adversarial-nibbler
Users that are interested in adversarial-nibbler are comparing it to the libraries listed below
Sorting:
- ☆48Updated 11 months ago
- An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)☆110Updated last year
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆68Updated last year
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆68Updated last year
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆57Updated last year
- ☆23Updated last year
- ☆23Updated last year
- ☆48Updated last year
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆66Updated last year
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆123Updated 11 months ago
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆107Updated last year
- ☆47Updated last year
- ☆70Updated last year
- ☆43Updated 2 years ago
- Code to conduct an embedding attack on LLMs☆31Updated last year
- [ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability☆176Updated last year
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆35Updated 2 years ago
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆83Updated last year
- [ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"☆97Updated last year
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆65Updated last year
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆50Updated last year
- Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-…☆44Updated 4 years ago
- Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…☆87Updated last year
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873☆174Updated last year
- Attack to induce LLMs within hallucinations☆164Updated last year
- Fingerprint large language models☆49Updated last year
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆193Updated 9 months ago
- Official repository for "PostMark: A Robust Blackbox Watermark for Large Language Models"☆27Updated last year
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆61Updated 4 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆98Updated last year