whitzard-ai / jade-db
View external linksLinks

"他山之石、可以攻玉"：复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB

☆495

Alternatives and similar repositories for jade-db

Users that are interested in jade-db are comparing it to the libraries listed below

Sorting:

thu-coai / Safety-Prompts
View on GitHub
Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts，用于评估和提升大模型的安全性。
☆1,127Feb 27, 2024Updated last year
CLUEbenchmark / SuperCLUE-Safety
View on GitHub
SC-Safety: 中文大模型多轮对抗安全基准
☆150Mar 15, 2024Updated last year
sherdencooper / GPTFuzz
View on GitHub
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
☆565Sep 24, 2024Updated last year
WhitzardIndex / WhitzardBench-2024A
View on GitHub
复旦白泽大模型安全基准测试集（2024年夏季版）
☆51Jul 31, 2024Updated last year
thu-coai / SafetyBench
View on GitHub
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆272Jul 28, 2025Updated 6 months ago
X-PLUG / CValues
View on GitHub
面向中文大模型价值观的评估与对齐研究
☆554Jul 20, 2023Updated 2 years ago
EasyJailbreak / EasyJailbreak
View on GitHub
An easy-to-use Python framework to generate adversarial jailbreak prompts.
☆815Mar 27, 2025Updated 10 months ago
thu-coai / ShieldLM
View on GitHub
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]
☆225Sep 29, 2024Updated last year
STAIR-BUPT / JailBench
View on GitHub
JailBench：大型语言模型越狱攻击风险评测中文数据集 [PAKDD 2025]
☆165Mar 3, 2025Updated 11 months ago
IS2Lab / S-Eval
View on GitHub
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
☆109Oct 14, 2025Updated 4 months ago
ydyjya / Awesome-LLM-Safety
View on GitHub
A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide…
☆1,769Feb 1, 2026Updated 2 weeks ago
Clouditera / Clouditera.github.io
View on GitHub
塑造未来的安全领域智能革命
☆634Jan 26, 2025Updated last year
Aatrox103 / SAP
View on GitHub
☆48May 9, 2024Updated last year
Clouditera / SecGPT
View on GitHub
SecGPT网络安全大模型
☆2,918Jun 25, 2025Updated 7 months ago
patrickrchao / JailbreakingLLMs
View on GitHub
☆696Jul 2, 2025Updated 7 months ago
tmlr-group / DeepInception
View on GitHub
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"
☆173Feb 20, 2024Updated last year
CryptoAILab / Awesome-LM-SSP
View on GitHub
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
☆1,856Jan 24, 2026Updated 3 weeks ago
RICommunity / TAP
View on GitHub
TAP: An automated jailbreaking method for black-box LLMs
☆220Dec 10, 2024Updated last year
YancyKahn / CoA
View on GitHub
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆39Jan 17, 2025Updated last year
AI45Lab / Flames
View on GitHub
Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.
☆63May 21, 2024Updated last year
HKUST-KnowComp / LLM-Multistep-Jailbreak
View on GitHub
Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT
☆35Oct 15, 2023Updated 2 years ago
microsoft / TOXIGEN
View on GitHub
This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.
☆346Jun 17, 2024Updated last year
llm-attacks / llm-attacks
View on GitHub
Universal and Transferable Attacks on Aligned Language Models
☆4,493Aug 2, 2024Updated last year
OpenSafetyLab / SALAD-BENCH
View on GitHub
【ACL 2024】 SALAD benchmark & MD-Judge
☆170Mar 8, 2025Updated 11 months ago
rangwang / CCAC2024-FS_Moderation
View on GitHub
CCAC2024——大模型安全的双重防线：少样本文本内容安全挑战赛仓库
☆32Jun 20, 2024Updated last year
TrustAIRLab / HateBench
View on GitHub
[USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
☆13Mar 1, 2025Updated 11 months ago
Tencent / AI-Infra-Guard
View on GitHub
A.I.G (AI-Infra-Guard) is a full-stack AI Red Teaming platform developed by Tencent Zhuque Lab that secures your AI ecosystem from infras…
☆2,952Updated this week
xirui-li / DrAttack
View on GitHub
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆65Aug 25, 2024Updated last year
Libr-AI / do-not-answer
View on GitHub
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
☆314Jun 7, 2024Updated last year
AI45Lab / ActorAttack
View on GitHub
☆121Feb 3, 2025Updated last year
CHATS-lab / persuasive_jailbreaker
View on GitHub
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
☆349Oct 17, 2025Updated 3 months ago
chawins / llm-sp
View on GitHub
Papers and resources related to the security and privacy of LLMs 🤖
☆561Jun 8, 2025Updated 8 months ago
wrlu / PendingIntentExp
View on GitHub
PendingIntent exploit
☆11Sep 26, 2023Updated 2 years ago
PKU-Alignment / beavertails
View on GitHub
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
☆175Oct 27, 2023Updated 2 years ago
centerforaisafety / HarmBench
View on GitHub
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
☆854Aug 16, 2024Updated last year
LLM-DRA / DRA
View on GitHub
[USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…
☆112Oct 11, 2024Updated last year
open-compass / opencompass
View on GitHub
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …
☆6,663Updated this week
agencyenterprise / PromptInject
View on GitHub
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…
☆454Feb 26, 2024Updated last year
JailbreakBench / jailbreakbench
View on GitHub
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
☆527Apr 4, 2025Updated 10 months ago

whitzard-ai / jade-dbView external linksLinks

Alternatives and similar repositories for jade-db

whitzard-ai / jade-db
View external linksLinks