Aligning AI With Shared Human Values (ICLR 2021)
☆315Apr 21, 2023Updated 2 years ago
Alternatives and similar repositories for ethics
Users that are interested in ethics are comparing it to the libraries listed below
Sorting:
- Data and code for the "Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences" (Emelin et al., 2021) pap…☆63Jul 18, 2022Updated 3 years ago
- A corpus and code for understanding norms and subjectivity. 🤖☆53Sep 26, 2024Updated last year
- Jiminy Cricket Environment (NeurIPS 2021)☆25Feb 12, 2022Updated 4 years ago
- Social Chemistry 101: Learning to Reason about Social and Moral Norms☆34Mar 17, 2023Updated 2 years ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆87Mar 2, 2021Updated 5 years ago
- ☆229Feb 23, 2021Updated 5 years ago
- ☆147Jul 23, 2025Updated 7 months ago
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆21Jul 18, 2023Updated 2 years ago
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- 🐥 Code and Dataset for our EMNLP 2022 paper - "ProsocialDialog: A Prosocial Backbone for Conversational Agents"☆65Aug 2, 2023Updated 2 years ago
- ☆15Oct 23, 2023Updated 2 years ago
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,558May 28, 2023Updated 2 years ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆154Aug 18, 2025Updated 6 months ago
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external w…☆31Jan 14, 2023Updated 3 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- ☆22Jul 18, 2024Updated last year
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation☆14Aug 19, 2025Updated 6 months ago
- ☆14Jul 27, 2020Updated 5 years ago
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Models☆17Jun 28, 2025Updated 8 months ago
- StereoSet: Measuring stereotypical bias in pretrained language models☆199Dec 8, 2022Updated 3 years ago
- [EMNLP 2021] Dataset and PyTorch Code for ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning☆15Nov 5, 2022Updated 3 years ago
- ☆10Jul 27, 2018Updated 7 years ago
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…☆130Mar 1, 2024Updated 2 years ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆129Feb 24, 2025Updated last year
- ☆23Mar 8, 2024Updated 2 years ago
- Augmenting Statistical Models with Natural Language Parameters☆29Sep 17, 2024Updated last year
- Code & Data for the paper "RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models"☆32May 31, 2021Updated 4 years ago
- A neural text style transfer model☆12Jun 23, 2019Updated 6 years ago
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆890Jan 16, 2025Updated last year
- XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning☆105Feb 4, 2021Updated 5 years ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,824Jun 17, 2025Updated 8 months ago
- "Why do I feel offended?" - Korean Dataset for Offensive Language Identification (EACL2023 Findings)☆15May 14, 2023Updated 2 years ago
- A PyTorch Implementation of the EMNLP 2020 paper "Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning"☆13Feb 20, 2021Updated 5 years ago
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16Mar 2, 2026Updated last week
- Function Vectors in Large Language Models (ICLR 2024)☆192Apr 17, 2025Updated 10 months ago
- A Comprehensive Assessment of Trustworthiness in GPT Models☆314Sep 16, 2024Updated last year
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".☆85Oct 31, 2022Updated 3 years ago
- ☆175May 28, 2019Updated 6 years ago