uw-nsl / ArtPrompt
Official Repo of ACL 2024 Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
☆36Updated 4 months ago
Related projects: ⓘ
- TAP: An automated jailbreaking method for black-box LLMs☆106Updated 6 months ago
- A collection of automated evaluators for assessing jailbreak attempts.☆55Updated 2 months ago
- An Open Robustness Benchmark for Jailbreaking Language Models [arXiv 2024]☆169Updated last month
- The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".☆203Updated last month
- Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873☆110Updated 4 months ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆89Updated 2 months ago
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆45Updated last month
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [arXiv, Apr 2024]☆181Updated last month
- [USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models☆61Updated this week
- ☆143Updated 9 months ago
- This repository provides implementation to formalize and benchmark Prompt Injection attacks and defenses☆125Updated 2 weeks ago
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆76Updated 6 months ago
- [ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability☆84Updated this week
- Weak-to-Strong Jailbreaking on Large Language Models☆62Updated 6 months ago
- An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)☆27Updated 7 months ago
- ☆24Updated last month
- Python package for measuring memorization in LLMs.☆107Updated this week
- Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆36Updated last month
- A fast + lightweight implementation of the GCG algorithm in PyTorch☆72Updated last week
- ☆63Updated 10 months ago
- Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"☆17Updated 10 months ago
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆26Updated last month
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆26Updated 3 weeks ago
- ☆12Updated 6 months ago
- ☆33Updated 2 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆81Updated this week
- Code to generate NeuralExecs (prompt injection for LLMs)☆14Updated last month
- ☆27Updated 3 months ago
- This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.☆77Updated 4 months ago
- A survey of privacy problems in Large Language Models (LLMs). Contains summary of the corresponding paper along with relevant code☆58Updated 3 months ago