llm-gasp / gasp

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

☆7

Alternatives and similar repositories for gasp:

Users that are interested in gasp are comparing it to the libraries listed below

chawins / pal
PAL: Proxy-Guided Black-Box Attack on Large Language Models
☆49Updated 7 months ago
SheltonLiu-N / Universal-Prompt-Injection
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆43Updated 5 months ago
weizeming / momentum-attack-llm
☆20Updated 2 months ago
pasquini-dario / LLMmap
☆30Updated 6 months ago
dukeceicenter / jailbreak-reasoning-openai-o1o3-deepseek-r1
☆49Updated 3 weeks ago
RapidResponseBench / rapidresponsebench
☆31Updated 4 months ago
AIM-Intelligence / Automated-Multi-Turn-Jailbreaks
☆40Updated 4 months ago
facebookresearch / SecAlign
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆40Updated 2 months ago
wagner-group / prompt-injection-defense
Fine-tuning base models to build robust task-specific models
☆28Updated 11 months ago
sail-sg / P-DoS
[ArXiv 2024] Denial-of-Service Poisoning Attacks on Large Language Models
☆17Updated 5 months ago
theshi-1128 / ABJ-Attack
LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language Models
☆18Updated last week
microsoft / BIPIA
A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.
☆63Updated 11 months ago
ThuCCSLab / JailbreakEval
[NDSS'25 Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆133Updated 3 weeks ago
allenai / wildteaming
☆22Updated 7 months ago
aengusl / latent-adversarial-training
☆32Updated 6 months ago
Aatrox103 / SAP
☆43Updated 10 months ago
SchwinnL / LLM_Embedding_Attack
Code to conduct an embedding attack on LLMs
☆23Updated 2 months ago
dsbowen / strong_reject
☆51Updated 2 months ago
microsoft / TaskTracker
TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…
☆50Updated 3 weeks ago
Yu-Fangxu / COLD-Attack
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆147Updated 3 months ago
ethz-spylab / satml-llm-ctf
Code used to run the platform for the LLM CTF colocated with SaTML 2024
☆26Updated last year
vinusankars / BEAST
Implementation of BEAST adversarial attack for language models (ICML 2024)
☆81Updated 10 months ago
wegodev2 / virtual-prompt-injection
Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆18Updated 8 months ago
rotaryhammer / code-autodan
An unofficial implementation of AutoDAN attack on LLMs (arXiv:2310.15140)
☆36Updated last year
Trust4AI / ASTRAL
Automated Safety Testing of Large Language Models
☆13Updated 2 months ago
microsoft / Firewalled-Agentic-Networks
Code for the paper "Firewalls to Secure Dynamic LLM Agentic Networks"
☆9Updated last month
GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆79Updated 2 months ago
thu-coai / Agent-SafetyBench
☆23Updated last month
ChenWu98 / agent-attack
[ICLR 2025] Dissecting Adversarial Robustness of Multimodal LM Agents
☆77Updated last month
TrustAIResearch / MLHospital
☆44Updated last year