theshi-1128 / llm-defenseLinks

An easy-to-use Python framework to defend against jailbreak prompts.

☆21

Alternatives and similar repositories for llm-defense

Users that are interested in llm-defense are comparing it to the libraries listed below

Sorting:

xingjunm / Awesome-Large-Model-Safety
Safety at Scale: A Comprehensive Survey of Large Model Safety
☆194Updated 7 months ago
liudaizong / Awesome-LVLM-Attack
😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.
☆394Updated last week
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆202Updated 7 months ago
chawins / llm-sp
Papers and resources related to the security and privacy of LLMs 🤖
☆536Updated 4 months ago
DSN-2024 / DSN
DSN jailbreak Attack & Evaluation Ensemble
☆10Updated 2 months ago
NY1024 / Foundation-Model-Paper-Notes
☆65Updated 4 months ago
SheltonLiu-N / AutoDAN
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆383Updated 8 months ago
ThuCCSLab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆173Updated 3 months ago
bboylyg / BackdoorLLM
[NeurIPS 2025] BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
☆219Updated 2 weeks ago
theshi-1128 / jailbreak-bench
The most comprehensive and accurate LLM jailbreak attack benchmark by far
☆21Updated 6 months ago
WUSTL-CSPL / LLMJailbreak
☆36Updated last year
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆106Updated last year
chen37058 / Red-Team-Arxiv-Paper-Update
Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)
☆64Updated this week
thunlp / OpenBackdoor
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
☆191Updated 2 years ago
Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models
Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
☆240Updated last year
ledllm / ledllm
☆21Updated last year
ThuCCSLab / Awesome-LM-SSP
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
☆1,691Updated this week
agiresearch / ASB
Agent Security Bench (ASB)
☆124Updated this week
isXinLiu / Awesome-MLLM-Safety
Accepted by IJCAI-24 Survey Track
☆216Updated last year
isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆158Updated 11 months ago
Aatrox103 / SAP
☆46Updated last year
mengtong0110 / InferDPT
☆32Updated 6 months ago
liuxuannan / Awesome-Multimodal-Jailbreak
A Survey on Jailbreak Attacks and Defenses against Multimodal Generative Models
☆234Updated last month
NJUNLP / ReNeLLM
The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…
☆137Updated last month
JailbreakBench / jailbreakbench
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
☆424Updated 6 months ago
casperllm / CASPER
☆15Updated last year
ZJUICSR / AIcert
☆223Updated last month
ltroin / llm_attack_defense_arena
☆82Updated last month
AI45Lab / ActorAttack
☆104Updated 8 months ago
GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆99Updated 8 months ago