parameterlab / trapLinks

Source code of "TRAP: Targeted Random Adversarial Prompt Honeypot for Black-Box Identification", ACL2024 (findings)

☆12

Alternatives and similar repositories for trap

Users that are interested in trap are comparing it to the libraries listed below

Sorting:

vinusankars / BEAST
Implementation of BEAST adversarial attack for language models (ICML 2024)
☆90Updated last year
pasquini-dario / LLMmap
☆51Updated 2 weeks ago
HKU-TASR / Imperio
[IJCAI 2024] Imperio is an LLM-powered backdoor attack. It allows the adversary to issue language-guided instructions to control the vict…
☆42Updated 5 months ago
leondz / lm_risk_cards
Risks and targets for assessing LLMs & LLM vulnerabilities
☆32Updated last year
dreadnode / research
General research for Dreadnode
☆23Updated last year
uiuc-kang-lab / InjecAgent
☆70Updated last year
dreadnode / parley
Tree of Attacks (TAP) Jailbreaking Implementation
☆114Updated last year
lakeraai / pint-benchmark
A benchmark for prompt injection detection systems.
☆124Updated 3 weeks ago
RenatoGeh / advtok
Adversarial Tokenization
☆24Updated 3 months ago
controllability / jailbreak-evaluation
The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.
☆25Updated 9 months ago
chawins / pal
PAL: Proxy-Guided Black-Box Attack on Large Language Models
☆53Updated 11 months ago
liu00222 / Open-Prompt-Injection
This repository provides a benchmark for prompt Injection attacks and defenses
☆255Updated 3 weeks ago
mitre-atlas / atlas-data
ATLAS tactics, techniques, and case studies data
☆77Updated 3 months ago
Valhall-ai / prompt-injection-mitigations
A collection of prompt injection mitigation techniques.
☆23Updated last year
microsoft / BIPIA
A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.
☆73Updated last year
SheltonLiu-N / Universal-Prompt-Injection
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆51Updated 9 months ago
lve-org / lve
A repository of Language Model Vulnerabilities and Exposures (LVEs).
☆113Updated last year
Libr-AI / OpenRedTeaming
Papers about red teaming LLMs and Multimodal models.
☆131Updated 2 months ago
AIM-Intelligence / Automated-Multi-Turn-Jailbreaks
☆82Updated 8 months ago
ThuCCSLab / JailbreakEval
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆165Updated 4 months ago
ebagdasa / multimodal_injection
☆91Updated last year
Yu-Fangxu / COLD-Attack
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆162Updated 7 months ago
daniel-e / secml
Security Weaknesses in Machine Learning
☆15Updated last year
kyleeasterly / loose-lips-multipliers
DEF CON 31 AI Village - LLMs: Loose Lips Multipliers
☆10Updated last year
briland / LLM-security-and-privacy
LLM security and privacy
☆49Updated 9 months ago
BishopFox / BrokenHill
A productionized greedy coordinate gradient (GCG) attack tool for large language models (LLMs)
☆123Updated 7 months ago
TrustAIResearch / MLHospital
☆44Updated 2 years ago
facebookresearch / privacy_adversarial_framework
The Privacy Adversarial Framework (PAF) is a knowledge base of privacy-focused adversarial tactics and techniques. PAF is heavily inspire…
☆58Updated last year
Reapor-Yurnero / imprompter
Codebase of https://arxiv.org/abs/2410.14923
☆49Updated 9 months ago
NickNameInvalid / LLM_CTF
☆65Updated 6 months ago