rucnyz / PrivAgentLinks

☆19

Alternatives and similar repositories for PrivAgent

Users that are interested in PrivAgent are comparing it to the libraries listed below

Sorting:

lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆85Updated 10 months ago
BHui97 / PLeak
☆60Updated 7 months ago
AI-secure / AdvAgent
☆12Updated 2 months ago
AI45Lab / ActorAttack
☆97Updated 6 months ago
SheltonLiu-N / Universal-Prompt-Injection
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆52Updated 9 months ago
usail-hkust / JailTrickBench
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆144Updated 8 months ago
AI-secure / AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆137Updated 3 months ago
wagner-group / prompt-injection-defense
Fine-tuning base models to build robust task-specific models
☆31Updated last year
Django-Jiang / BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆36Updated last year
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆176Updated 5 months ago
agiresearch / ASB
Agent Security Bench (ASB)
☆102Updated last month
uiuc-kang-lab / InjecAgent
☆70Updated last year
AI45Lab / CodeAttack
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆50Updated 9 months ago
chen37058 / Red-Team-Arxiv-Paper-Update
Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)
☆45Updated this week
Lyz1213 / BadEdit
☆32Updated 9 months ago
reds-lab / BEEAR
This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…
☆17Updated last year
HKUST-KnowComp / LLM-Multistep-Jailbreak
Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
☆34Updated last year
CLAS2024 / starter-kit
☆39Updated 9 months ago
SaFoLab-WISC / JailBreakV_28K
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆75Updated 3 months ago
Sizhe-Chen / StruQ
official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries
☆44Updated 2 weeks ago
pasquini-dario / LLM_NeuralExec
Code to generate NeuralExecs (prompt injection for LLMs)
☆22Updated 8 months ago
wang2226 / Trojan-Activation-Attack
[CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.
☆25Updated last year
Yu-Fangxu / COLD-Attack
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆162Updated 7 months ago
OSU-NLP-Group / EIA_against_webagent
☆30Updated 10 months ago
inspire-group / RobustRAG
☆18Updated 10 months ago
uw-nsl / ArtPrompt
[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
☆79Updated 5 months ago
ChenWu98 / agent-attack
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
☆98Updated 5 months ago
ThuCCSLab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆160Updated last month
SproutNan / AI-Safety_SCAV
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆43Updated 8 months ago
AI-secure / RedCode
[NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents
☆45Updated last month