cicl-stanford / moca
Language model evaluation for morality and causality
☆15Updated last year
Related projects ⓘ
Alternatives and complementary repositories for moca
- Apps built using Inspired Cognition's Critique.☆58Updated last year
- ☆21Updated 4 months ago
- 👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"☆51Updated 5 months ago
- The Prism Alignment Project☆37Updated 6 months ago
- ☆86Updated 5 months ago
- ☆14Updated 7 months ago
- ☆22Updated 5 months ago
- A corpus and code for understanding norms and subjectivity. 🤖☆45Updated last month
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆62Updated last year
- ☆44Updated last year
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆98Updated 4 months ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.☆30Updated last year
- ☆21Updated 8 months ago
- A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.☆97Updated last year
- A Python Commonsense Knowledge Inference Toolkit☆61Updated 11 months ago
- ☆94Updated 6 months ago
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆29Updated 8 months ago
- ☆149Updated 6 months ago
- Super fast implementations of common benchmark text world games☆43Updated this week
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆29Updated 8 months ago
- ☆25Updated last month
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆55Updated 5 months ago
- Teaching Models to Express Their Uncertainty in Words☆36Updated 2 years ago
- Resources for cultural NLP research☆61Updated last week
- Data and code for the "Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences" (Emelin et al., 2021) pap…☆51Updated 2 years ago
- Code accompanying our papers on the "Generative Distributional Control" framework☆117Updated last year
- A Toolkit for Distributional Control of Generative Models☆69Updated last year
- Evaluating the Moral Beliefs Encoded in LLMs☆21Updated 9 months ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆38Updated last year
- ☆17Updated 11 months ago