Lordog / R-JudgeLinks
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)
☆80Updated 2 months ago
Alternatives and similar repositories for R-Judge
Users that are interested in R-Judge are comparing it to the libraries listed below
Sorting:
- 【ACL 2024】 SALAD benchmark & MD-Judge☆154Updated 4 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆94Updated last year
- BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).☆148Updated last year
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆42Updated last year
- S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models☆73Updated last week
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆76Updated 2 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆61Updated last month
- ☆92Updated 2 months ago
- ☆33Updated 9 months ago
- LLM Unlearning