hendrycks / ethicsLinks
Aligning AI With Shared Human Values (ICLR 2021)
☆314Updated 2 years ago
Alternatives and similar repositories for ethics
Users that are interested in ethics are comparing it to the libraries listed below
Sorting:
- Repository for research in the field of Responsible NLP at Meta.☆205Updated this week
- ☆117Updated last year
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆85Updated 4 years ago
- ☆299Updated 3 weeks ago
- ☆228Updated 4 years ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆102Updated 2 years ago
- ☆144Updated 6 months ago
- StereoSet: Measuring stereotypical bias in pretrained language models☆197Updated 3 years ago
- Repository for the Bias Benchmark for QA dataset.☆136Updated 2 years ago
- This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".☆89Updated 4 years ago
- PAIR.withgoogle.com and friend's work on interpretability methods☆220Updated this week
- MEND: Fast Model Editing at Scale☆258Updated 2 years ago
- Package to compute Mauve, a similarity score between neural text and human text. Install with `pip install mauve-text`.☆307Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆154Updated 5 months ago
- Inspecting and Editing Knowledge Representations in Language Models☆119Updated 2 years ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆100Updated 2 years ago
- ☆328Updated last year
- The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.☆182Updated 3 years ago
- A library for finding knowledge neurons in pretrained transformer models.☆159Updated 3 years ago
- The Prism Alignment Project☆89Updated last year
- Interpretability for sequence generation models 🐛 🔍☆453Updated last week
- ☆138Updated last year
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.☆154Updated 5 months ago
- Sparse probing paper full code.☆66Updated 2 years ago
- Utilities for the HuggingFace transformers library☆74Updated 3 years ago
- ☆23Updated last year
- ☆250Updated 3 years ago
- Accompanying repo for the RLPrompt paper☆360Updated last year
- ☆267Updated last year
- This repo contains the code for generating the ToxiGen dataset, published at ACL 2022.☆346Updated last year