WhitzardIndex / WhitzardBench-2024A
复旦白泽大模型安全基准测试集(2024年夏季版)
☆21Updated last month
Related projects: ⓘ
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety.☆141Updated 2 months ago
- SC-Safety: 中文大模型多轮对抗安全基准☆94Updated 6 months ago
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors☆134Updated 2 months ago
- "他山之石、可以攻玉":复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB☆293Updated 2 months ago
- Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts☆366Updated 5 months ago
- This project aims to consolidate and share high-quality resources and tools across the cybersecurity domain.☆42Updated last week
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆65Updated last month
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆845Updated 6 months ago
- The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".☆203Updated last month
- Hide and Seek (HaS): A Framework for Prompt Privacy Protection☆24Updated last year
- 面向中文大模型价值观的评估与对齐研究☆468Updated last year
- AutoAudit—— the LLM for Cyber Security 网络安全大语言模型☆250Updated 4 months ago
- MarkLLM: An Open-Source Toolkit for LLM Watermarking.☆246Updated last month
- A collection of automated evaluators for assessing jailbreak attempts.☆55Updated 2 months ago
- Flames is a highly adversarial benchmark in Chinese for LLM's harmlessness evaluation developed by Shanghai AI Lab and Fudan NLP Group.☆30Updated 3 months ago
- The repository of paper "HackMentor: Fine-Tuning Large Language Models for Cybersecurity".☆89Updated 3 months ago
- ☆37Updated 4 months ago
- [arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"☆109Updated 7 months ago
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…☆36Updated 3 weeks ago
- ☆131Updated last month
- Papers about red teaming LLMs and Multimodal models.☆66Updated this week
- ☆66Updated 5 months ago
- Repository for Towards Codable Watermarking for Large Language Models☆26Updated last year
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs☆156Updated 3 months ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents☆57Updated last month
- An easy-to-use Python framework to generate adversarial jailbreak prompts.☆403Updated 2 weeks ago
- ☆12Updated 6 months ago