dapurv5 / awesome-red-teaming-llms
Repository accompanying the paper https://openreview.net/pdf?id=sSAp8ITBpC
☆24Updated 3 weeks ago
Alternatives and similar repositories for awesome-red-teaming-llms
Users that are interested in awesome-red-teaming-llms are comparing it to the libraries listed below
Sorting:
- LLM security and privacy☆49Updated 7 months ago
- Implementation of BEAST adversarial attack for language models (ICML 2024)☆86Updated last year
- A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.☆66Updated last year
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆50Updated 9 months ago
- Risks and targets for assessing LLMs & LLM vulnerabilities☆30Updated 11 months ago
- ☆62Updated 5 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆68Updated last year
- TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…☆55Updated 2 months ago
- [ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability☆152Updated 5 months ago
- A collection of prompt injection mitigation techniques.☆22Updated last year
- A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022☆28Updated last year
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆46Updated 6 months ago
- ☆32Updated 6 months ago
- Whispers in the Machine: Confidentiality in Agentic Systems☆37Updated this week
- ☆74Updated 3 weeks ago
- ☆45Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆156Updated last month
- Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"☆45Updated last month
- ☆44Updated 2 years ago
- Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers☆52Updated 8 months ago
- This repository provides a benchmark for prompt Injection attacks and defenses☆196Updated 2 weeks ago
- ☆35Updated last year
- ☆25Updated 9 months ago
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆19Updated last year
- A curated and updated list of relevant articles and repositories on Reinforcement Learning from AI Feedback (RLAIF)☆12Updated last year
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆119Updated last year
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆154Updated last week
- ☆62Updated 10 months ago
- [IJCAI 2024] Imperio is an LLM-powered backdoor attack. It allows the adversary to issue language-guided instructions to control the vict…☆41Updated 3 months ago
- Papers about red teaming LLMs and Multimodal models.☆115Updated 5 months ago