HumanCompatibleAI / tensor-trustLinks

A prompt injection game to collect data for robust ML research

☆62

Alternatives and similar repositories for tensor-trust

Users that are interested in tensor-trust are comparing it to the libraries listed below

Sorting:

AIM-Intelligence / Automated-Multi-Turn-Jailbreaks
☆82Updated 8 months ago
vinusankars / BEAST
Implementation of BEAST adversarial attack for language models (ICML 2024)
☆90Updated last year
dsbowen / strong_reject
☆81Updated last month
JonasGeiping / carving
Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives
☆70Updated last year
chawins / pal
PAL: Proxy-Guided Black-Box Attack on Large Language Models
☆53Updated 11 months ago
SheltonLiu-N / Universal-Prompt-Injection
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆51Updated 9 months ago
Yu-Fangxu / COLD-Attack
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆162Updated 7 months ago
tml-epfl / llm-adaptive-attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆324Updated 6 months ago
Libr-AI / OpenRedTeaming
Papers about red teaming LLMs and Multimodal models.
☆131Updated 2 months ago
microsoft / BIPIA
A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.
☆73Updated last year
ThuCCSLab / JailbreakEval
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.
☆165Updated 4 months ago
RICommunity / TAP
TAP: An automated jailbreaking method for black-box LLMs
☆180Updated 7 months ago
CHATS-lab / persuasive_jailbreaker
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
☆313Updated 9 months ago
RapidResponseBench / rapidresponsebench
☆34Updated 8 months ago
ebagdasa / multimodal_injection
☆91Updated last year
microsoft / TaskTracker
TaskTracker is an approach to detecting task drift in Large Language Models (LLMs) by analysing their internal activations. It provides a…
☆62Updated 5 months ago
liu00222 / Open-Prompt-Injection
This repository provides a benchmark for prompt Injection attacks and defenses
☆255Updated 3 weeks ago
Babelscape / ALERT
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
☆44Updated 10 months ago
tml-epfl / llm-past-tense
Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]
☆72Updated 6 months ago
ethz-spylab / rlhf_trojan_competition
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
☆114Updated last year
LostOxygen / llm-confidentiality
Whispers in the Machine: Confidentiality in Agentic Systems
☆39Updated 2 months ago
azshue / AutoPoison
The official repository of the paper "On the Exploitability of Instruction Tuning".
☆64Updated last year
declare-lab / ferret
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
☆18Updated 11 months ago
AI-secure / AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆137Updated 3 months ago
uiuc-kang-lab / InjecAgent
☆70Updated last year
dukeceicenter / jailbreak-reasoning-openai-o1o3-deepseek-r1
☆96Updated 3 months ago
RainJamesY / FuzzLLM
The opensoure repository of FuzzLLM
☆27Updated last year
PKU-YuanGroup / Hallucination-Attack
Attack to induce LLMs within hallucinations
☆156Updated last year
facebookresearch / SecAlign
Repo for the research paper "SecAlign: Defending Against Prompt Injection with Preference Optimization"
☆63Updated 2 weeks ago
Princeton-SysML / Jailbreak_LLM
☆178Updated last year