MurrayTom / SG-BenchLinks

SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types

☆20

Alternatives and similar repositories for SG-Bench

Users that are interested in SG-Bench are comparing it to the libraries listed below

Sorting:

AI45Lab / ActorAttack
☆97Updated 6 months ago
thu-coai / JailbreakDefense_GoalPriority
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆26Updated last year
uw-nsl / SafeDecoding
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆140Updated last year
ydyjya / LLM-IHS-Explanation
☆51Updated last year
salman-lui / x-teaming
☆32Updated 2 months ago
chujiezheng / LLM-Safeguard
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆94Updated 2 months ago
ChenWu98 / agent-attack
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
☆98Updated 5 months ago
thu-coai / SafeUnlearning
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
☆29Updated last year
SaFoLab-WISC / JailBreakV_28K
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆75Updated 3 months ago
OpenSafetyLab / SALAD-BENCH
【ACL 2024】 SALAD benchmark & MD-Judge
☆156Updated 5 months ago
SORRY-Bench / sorry-bench
Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)
☆57Updated 5 months ago
zihao-ai / unthinking_vulnerability
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
☆31Updated 2 months ago
isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆147Updated 9 months ago
THU-BPM / Watermark-Radioactivity-Attack
Code and data for paper "Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?". (ACL 2025 Main)
☆16Updated last month
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆106Updated last year
AI45Lab / CodeAttack
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆50Updated 9 months ago
wonderNefelibata / Awesome-LRM-Safety
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆68Updated last week
yjw1029 / Self-Reminder
Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.
☆53Updated last year
xyq7 / GradSafe
Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"
☆57Updated 9 months ago
YihanWang617 / LLM-Jailbreaking-Defense-Backtranslation
Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"
☆30Updated 11 months ago
DAMO-NLP-SG / multilingual-safety-for-LLMs
[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"
☆79Updated last year
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
☆55Updated 4 months ago
OPTML-Group / SOUL
Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"
☆26Updated 10 months ago
thu-coai / LRM-Safety-Study
☆15Updated 2 months ago
YihanWang617 / llm-jailbreaking-defense
A lightweight library for large laguage model (LLM) jailbreaking defense.
☆54Updated 9 months ago
HKUST-KnowComp / LLM-Multistep-Jailbreak
Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
☆34Updated last year
OSU-NLP-Group / AmpleGCG
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆69Updated 9 months ago
boyiwei / alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆82Updated 4 months ago
SproutNan / AI-Safety_SCAV
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆43Updated 8 months ago
Princeton-SysML / Jailbreak_LLM
☆178Updated last year