hendrycks / ethicsLinks
Aligning AI With Shared Human Values (ICLR 2021)
☆305Updated 2 years ago
Alternatives and similar repositories for ethics
Users that are interested in ethics are comparing it to the libraries listed below
Sorting:
- Repository for research in the field of Responsible NLP at Meta.☆204Updated 6 months ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆84Updated 4 years ago
- ☆116Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆151Updated 3 months ago
- StereoSet: Measuring stereotypical bias in pretrained language models☆194Updated 3 years ago
- ☆224Updated 4 years ago
- Repository for the Bias Benchmark for QA dataset.☆133Updated last year
- This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".☆88Updated 4 years ago
- The Prism Alignment Project☆86Updated last year
- Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.☆303Updated last year
- Utilities for the HuggingFace transformers library☆72Updated 2 years ago
- Inspecting and Editing Knowledge Representations in Language Models☆119Updated 2 years ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆100Updated 2 years ago
- ☆293Updated last week
- Sparse probing paper full code.