Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
☆328Jun 7, 2024Updated 2 years ago
Alternatives and similar repositories for do-not-answer
Users that are interested in do-not-answer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆351Feb 23, 2024Updated 2 years ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆135Feb 24, 2025Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆111Mar 8, 2024Updated 2 years ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆94May 9, 2024Updated 2 years ago
- [ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"☆105Mar 7, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆288Jul 28, 2025Updated 10 months ago
- ☆39May 21, 2024Updated 2 years ago
- Chinese safety prompts for evaluating and improving the safety of LLMs. 中文安全prompts,用于评估和提升大模型的安全性。☆1,171Feb 27, 2024Updated 2 years ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆123Dec 2, 2024Updated last year
- ☆32Aug 9, 2024Updated last year
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!☆358Oct 17, 2025Updated 7 months ago
- ☆19Jun 21, 2025Updated 11 months ago
- [NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey☆111Aug 7, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆976Aug 16, 2024Updated last year