cicl-stanford / moca
Language model evaluation for morality and causality
β16Updated last year
Alternatives and similar repositories for moca:
Users that are interested in moca are comparing it to the libraries listed below
- datasets from the paper "Towards Understanding Sycophancy in Language Models"β71Updated last year
- π» Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"β52Updated 8 months ago
- A Toolkit for Distributional Control of Generative Modelsβ70Updated last year
- Teaching Models to Express Their Uncertainty in Wordsβ36Updated 2 years ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasksβ40Updated 2 months ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgmentβ38Updated last year
- Apps built using Inspired Cognition's Critique.β58Updated last year
- β22Updated 11 months ago
- β45Updated last year
- β35Updated 2 years ago
- β76Updated 6 months ago
- Inspecting and Editing Knowledge Representations in Language Modelsβ112Updated last year
- The Prism Alignment Projectβ66Updated 9 months ago
- β31Updated last year
- Super fast implementations of common benchmark text world gamesβ45Updated 2 months ago
- Resources for cultural NLP researchβ86Updated last month
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"β28Updated 2 years ago
- β27Updated 11 months ago
- β50Updated last year
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Modβ¦β33Updated 11 months ago
- Code and Data for the NAACL 24 paper: MacGyver: Are Large Language Models Creative Problem Solvers?β25Updated 10 months ago
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"β19Updated last year
- Code for "Tracing Knowledge in Language Models Back to the Training Data"β37Updated 2 years ago
- β32Updated last year
- β24Updated 9 months ago
- β104Updated 9 months ago
- Experiments on including metadata such as URLs, timestamps, website descriptions and HTML tags during pretraining.β31Updated last year
- DialOp: Decision-oriented dialogue environments for collaborative language agentsβ106Updated 3 months ago
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Impliciβ¦β103Updated last year
- β21Updated 4 months ago