JailBench:大型语言模型越狱攻击风险评测中文数据集 [PAKDD 2025]
☆174Mar 3, 2025Updated last year
Alternatives and similar repositories for JailBench
Users that are interested in JailBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Sep 29, 2024Updated last year
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆228Sep 29, 2024Updated last year
- "他山之石、可以攻玉":复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB☆507Nov 18, 2025Updated 5 months ago
- SC-Safety: 中文大模型多轮对抗安全基准☆150Mar 15, 2024Updated 2 years ago
- ☆21Jul 26, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 【ACL 2024】 SALAD benchmark & MD-Judge☆175Mar 8, 2025Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆152Jul 19, 2024Updated last year
- ☆26Nov 4, 2024Updated last year
- 🚀 JailbreakBench 是一个用于评估大语言模型(LLM)安全性的测试工具,专注于检测模型对越狱攻击(Jailbreak)的抵抗能力。通过模 拟恶意提示词注入、编码攻击和多轮对话操控,量化模型的漏洞风险,并生成详细报告与可视化分析。支持中英文数据集,适用于安全研究…☆32Sep 1, 2025Updated 8 months ago
- Official Implementation of implicit reference attack☆11Oct 16, 2024Updated last year
- Red Queen Dataset and data generation template☆26Dec 26, 2025Updated 4 months ago
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆86Nov 3, 2024Updated last year
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,151Feb 27, 2024Updated 2 years ago
- Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 1A).☆16Jul 4, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- enchmarking Large Language Models' Resistance to Malicious Code☆16Apr 23, 2026Updated last week
- SecProbe:任务驱动式大 模型安全能力评测系统☆15Nov 29, 2024Updated last year
- ☆166Sep 2, 2024Updated last year
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆385Jan 23, 2025Updated last year
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆191Apr 1, 2025Updated last year
- [NDSS'25] The official implementation of safety misalignment.☆19Jan 8, 2025Updated last year
- Repository for the work of the CoSAI Technical Steering Committee (TSC)☆21Updated this week
- Code for Rethinking Prompt Optimizers: From Prompt Merits to Optimization☆13Jan 12, 2026Updated 3 months ago
- LLM Self Defense: By Self Examination, LLMs know they are being tricked☆52May 21, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'☆24Jun 9, 2024Updated last year
- ☆26Jan 12, 2026Updated 3 months ago
- [ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models☆53Jul 24, 2024Updated last year
- [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns☆14Mar 1, 2025Updated last year
- 常用CTF爆破字典整理☆16Apr 8, 2023Updated 3 years ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆72May 22, 2025Updated 11 months ago
- 面向中文大模型价值观的评估与对齐研究☆556Jul 20, 2023Updated 2 years ago
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆21Mar 22, 2025Updated last year
- ☆31Oct 23, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆51Jan 11, 2025Updated last year
- An easy-to-use Python framework to generate adversarial jailbreak prompts.☆848Mar 30, 2026Updated last month
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- 针对大语言模型的对抗性攻击总结☆38Dec 22, 2023Updated 2 years ago
- IoM defualt mal package☆10Feb 22, 2026Updated 2 months ago
- Configure sqlmap use proxy automatically(自动获取代理IP)☆13Aug 6, 2020Updated 5 years ago
- 最近很火的SSLvpn的安全设备通杀☆27Dec 15, 2023Updated 2 years ago