JailBench:大型语言模型越狱攻击风险评测中文数据集 [PAKDD 2025]
☆183Mar 3, 2025Updated last year
Alternatives and similar repositories for JailBench
Users that are interested in JailBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Sep 29, 2024Updated last year
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆231Sep 29, 2024Updated last year
- "他山之石、可以攻玉":复旦JADE团队发布的大模型测评与治理系列☆512May 14, 2026Updated last week
- SC-Safety: 中文大模型多轮对抗安全基准☆151Mar 15, 2024Updated 2 years ago
- ☆22Jul 26, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 【ACL 2024】 SALAD benchmark & MD-Judge☆175Mar 8, 2025Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆152Jul 19, 2024Updated last year
- ☆26Nov 4, 2024Updated last year
- 🚀 JailbreakBench 是一个用于评估大语言模型(LLM)安全性的测试工具,专注于检测模型对越狱攻击(Jailbreak)的抵抗能力。通过模拟恶意提示词注入、编码攻击和多轮对话操控 ,量化模型的漏洞风险,并生成详细报告与可视化分析。支持中英文数据集,适用于安全研究…☆34Sep 1, 2025Updated 8 months ago
- Official Implementation of implicit reference attack☆11Oct 16, 2024Updated last year
- Red Queen Dataset and data generation template☆26Dec 26, 2025Updated 4 months ago
- AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM☆87Nov 3, 2024Updated last year
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,162Feb 27, 2024Updated 2 years ago
- Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 1A).☆16Jul 4, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- enchmarking Large Language Models' Resistance to Malicious Code☆16Apr 23, 2026Updated last month
- SecProbe:任务驱动式大模型安全能力评测系统☆15Nov 29, 2024Updated last year
- ☆169Sep 2, 2024Updated last year
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆386Jan 23, 2025Updated last year
- [NDSS'25] The official implementation of safety misalignment.☆19Jan 8, 2025Updated last year
- Repository for the work of the CoSAI Technical Steering Committee (TSC)☆23Updated this week
- Code for Rethinking Prompt Optimizers: From Prompt Merits to Optimization☆13Jan 12, 2026Updated 4 months ago
- LLM Self Defense: By Self Examination, LLMs know they are being tricked☆51May 21, 2024Updated 2 years ago
- Official Code for EMNLP 2023 paper: "Unveiling the Implicit Toxicity in Large Language Models""☆15Nov 30, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆28Jan 12, 2026Updated 4 months ago
- [ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models☆53Jul 24, 2024Updated last year
- [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns☆14Mar 1, 2025Updated last year
- 常用CTF爆破字典整理☆16Apr 8, 2023Updated 3 years ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆72May 22, 2025Updated last year
- 面向中文大模型价值观的评估与对齐研究☆556Jul 20, 2023Updated 2 years ago
- The most comprehensive and accurate LLM jailbreak attack benchmark by far☆21Mar 22, 2025Updated last year
- [USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models☆51Jan 11, 2025Updated last year
- An easy-to-use Python framework to generate adversarial jailbreak prompts.☆855Mar 30, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning☆11Oct 29, 2024Updated last year
- 针对大语言模型的对抗性攻击总结☆38Dec 22, 2023Updated 2 years ago
- IoM defualt mal package☆10Feb 22, 2026Updated 3 months ago
- Configure sqlmap use proxy automatically(自动获取代理IP)☆13Aug 6, 2020Updated 5 years ago
- 最近很火的SSLvpn的安全设备通杀☆27Dec 15, 2023Updated 2 years ago
- build gdb static for all support arch☆26Apr 27, 2022Updated 4 years ago
- woodpecker框架专用bcel库☆12Apr 30, 2021Updated 5 years ago