allenai / wildguardLinks
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆97Updated last year
Alternatives and similar repositories for wildguard
Users that are interested in wildguard are comparing it to the libraries listed below
Sorting:
- Improving Alignment and Robustness with Circuit Breakers☆248Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆152Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"