JailBench:大型语言模型越狱攻击风险评测中文数据集 [PAKDD 2025]
☆170Mar 3, 2025Updated last year
Alternatives and similar repositories for JailBench
Users that are interested in JailBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Sep 29, 2024Updated last year
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆226Sep 29, 2024Updated last year
- "他山之石、可以攻玉":复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB☆500Nov 18, 2025Updated 4 months ago
- SC-Safety: 中文大模型多轮对抗安全基准☆150Mar 15, 2024Updated 2 years ago
- ☆21Jul 26, 2025Updated 7 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆171Mar 8, 2025Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆151Jul 19, 2024Updated last year
- ☆25Nov 4, 2024Updated last year
- 🚀 JailbreakBench 是一个用于评估大语言模型(LLM)安全性的测试工具,专注于检测模型对越狱攻击(Jailbreak)的抵抗能力。通过模拟恶意提示词注入、编码攻击和多轮对话操控,量化模型的漏洞风险,并生成详细报告与可视化分析。支持中英文数据集,适用于安全研究…☆31Sep 1, 2025Updated 6 months ago
- enchmarking Large Language Models' Resistance to Malicious Code☆14Dec 1, 2024Updated last year
- Official Implementation of implicit reference attack☆11Oct 16, 2024Updated last year
- Red Queen Dataset and data generation template☆27Dec 26, 2025Updated 2 months ago
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆85Nov 3, 2024Updated last year
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,136Feb 27, 2024Updated 2 years ago
- Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 1A).☆16Jul 4, 2024Updated last year
- SecProbe:任务驱动式大模型安全能力评测系统☆15Nov 29, 2024Updated last year
- ☆165Sep 2, 2024Updated last year
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆380Jan 23, 2025Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆188Apr 1, 2025Updated 11 months ago
- 常用CTF爆破字典整理☆15Apr 8, 2023Updated 2 years ago
- [NDSS'25] The official implementation of safety misalignment.☆17Jan 8, 2025Updated last year
- Code for Rethinking Prompt Optimizers: From Prompt Merits to Optimization☆13Jan 12, 2026Updated 2 months ago
- LLM Self Defense: By Self Examination, LLMs know they are being tricked☆51May 21, 2024Updated last year
- Official Code for EMNLP 2023 paper: "Unveiling the Implicit Toxicity in Large Language Models""☆15Nov 30, 2023Updated 2 years ago
- Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'☆23Jun 9, 2024Updated last year
- [ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models☆50Jul 24, 2024Updated last year
- [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns☆13Mar 1, 2025Updated last year
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆72May 22, 2025Updated 10 months ago
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆51Jan 11, 2025Updated last year
- 面向中文大模型价值观的评估与对齐研究☆554Jul 20, 2023Updated 2 years ago
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆22Mar 22, 2025Updated last year
- ☆30Oct 23, 2024Updated last year
- An easy-to-use Python framework to generate adversarial jailbreak prompts.☆826Mar 27, 2025Updated 11 months ago
- It is a pure front-end tool for testing the security boundaries of large language models, helping researchers to find and fix potential s…☆20May 6, 2025Updated 10 months ago
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- 针对大语言模型的对抗性攻击总结☆39Dec 22, 2023Updated 2 years ago
- IoM defualt mal package☆10Feb 22, 2026Updated last month
- Configure sqlmap use proxy automatically(自动获取代理IP)☆14Aug 6, 2020Updated 5 years ago
- ☆39May 17, 2025Updated 10 months ago