hendrycks / ethicsLinks
Aligning AI With Shared Human Values (ICLR 2021)
☆289Updated 2 years ago
Alternatives and similar repositories for ethics
Users that are interested in ethics are comparing it to the libraries listed below
Sorting:
- Repository for research in the field of Responsible NLP at Meta.☆201Updated 2 months ago
- Utilities for the HuggingFace transformers library☆69Updated 2 years ago
- ☆110Updated last year
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆79Updated 4 years ago
- The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.☆177Updated 3 years ago
- StereoSet: Measuring stereotypical bias in pretrained language models☆185Updated 2 years ago
- Repository for the Bias Benchmark for QA dataset.☆123Updated last year
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆95Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆82Updated last year
- PAIR.withgoogle.com and friend's work on interpretability methods☆194Updated this week
- ☆215Updated 4 years ago
- This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".☆88Updated 3 years ago
- Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.☆293Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆139Updated 7 months ago
- ☆273Updated last year
- ☆294Updated last week
- MEND: Fast Model Editing at Scale☆247Updated last year
- The Prism Alignment Project☆79Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆116Updated last year
- ☆283Updated last year
- A library for finding knowledge neurons in pretrained transformer models.☆158Updated 3 years ago
- ☆231Updated 9 months ago
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…☆122Updated last year
- Interpretability for sequence generation models 🐛 🔍☆431Updated 2 months ago
- Sparse probing paper full code.☆58Updated last year
- ☆239Updated 2 years ago
- ☆219Updated last year
- ☆140Updated last year
- Few-shot Learning of GPT-3☆352Updated last year
- Tools for understanding how transformer predictions are built layer-by-layer☆505Updated last year