usail-hkust / Jailjudge
JAILJUDGE: A comprehensive evaluation benchmark which includes a wide range of risk scenarios with complex malicious prompts (e.g., synthetic, adversarial, in-the-wild, and multi-language scenarios, etc.) along with high-quality human- annotated test datasets.
☆33Updated last month
Alternatives and similar repositories for Jailjudge:
Users that are interested in Jailjudge are comparing it to the libraries listed below
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆110Updated 2 months ago
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆100Updated 4 months ago
- ☆70Updated last week
- This is the repo for the survey of Bias and Fairness in IR with LLMs.☆47Updated 3 months ago
- TrustAgent: Towards Safe and Trustworthy LLM-based Agents☆33Updated 6 months ago
- ☆40Updated 3 months ago
- ☆15Updated 3 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆88Updated 8 months ago
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆68Updated 3 months ago
- A framework to empover LLMs on graph reasoning and generation. Refer to our paper: https://arxiv.org/pdf/2402.08785.pdf☆76Updated 6 months ago
- A collection of resources that investigate social agents.☆97Updated last week
- FedJudge: Federated Legal Large Language Model☆32Updated 4 months ago
- [ICML2024] "LLaGA: Large Language and Graph Assistant", Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, Zhangyang Wang☆92Updated 4 months ago
- Official repository of "Can Language Models Solve Graph Problems in Natural Language?". NeurIPS 2023 (Spotlight)☆118Updated 5 months ago
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆15Updated 4 months ago
- Repo of "Large Language Model-based Human-Agent Collaboration for Complex Task Solving(EMNLP2024 Findings)"☆30Updated 4 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆119Updated last month
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆113Updated 6 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆39Updated 3 months ago
- [NeurIPS 2024] The implementation of paper "On Softmax Direct Preference Optimization for Recommendation"☆58Updated 2 months ago
- ☆13Updated 11 months ago
- ☆40Updated last year
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆65Updated 3 months ago
- The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"☆54Updated 2 months ago
- ☆109Updated 4 months ago
- ☆25Updated 8 months ago
- ☆37Updated 7 months ago
- ☆71Updated last month
- ☆49Updated last month