OSU-NLP-Group / AgentSafety
☆41Updated this week
Alternatives and similar repositories for AgentSafety:
Users that are interested in AgentSafety are comparing it to the libraries listed below
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆65Updated 3 months ago
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆44Updated 3 months ago
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆77Updated last year
- 【ACL 2024】 SALAD benchmark & MD-Judge☆116Updated last month
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆83Updated 4 months ago
- [EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"☆19Updated 3 months ago
- ☆14Updated 3 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆125Updated 9 months ago
- ☆21Updated 6 months ago
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆46Updated 5 months ago
- ☆17Updated 2 months ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆57Updated last year
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆16Updated 6 months ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆112Updated 5 months ago
- [FCS'24] LVLM Safety paper☆17Updated 2 weeks ago
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆66Updated 2 years ago
- ☆36Updated last year
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆42Updated 4 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆88Updated 7 months ago
- Lightweight tool to identify Data Contamination in LLMs evaluation☆45Updated 10 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆53Updated 10 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆64Updated 9 months ago
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆48Updated last year
- This repository contains data, code and models for contextual noncompliance.☆19Updated 6 months ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆58Updated last year
- ☆44Updated 4 months ago
- Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"☆48Updated 5 months ago
- Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…☆67Updated 10 months ago
- Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.☆44Updated last year
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆105Updated 4 months ago