cicl-stanford / mocaLinks
Language model evaluation for morality and causality
β19Updated 2 years ago
Alternatives and similar repositories for moca
Users that are interested in moca are comparing it to the libraries listed below
Sorting:
- β79Updated last year
- π» Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"β56Updated last year
- Apps built using Inspired Cognition's Critique.β57Updated 2 years ago
- A Python library that encapsulates various methods for neuron interpretation and analysis in Deep NLP models.β105Updated 2 years ago
- A Toolkit for Distributional Control of Generative Modelsβ73Updated 3 months ago
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.β133Updated last year
- Repository for research in the field of Responsible NLP at Meta.β202Updated 6 months ago
- PAIR.withgoogle.com and friend's work on interpretability methodsβ214Updated this week
- β26Updated 9 months ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasksβ49Updated 11 months ago
- Code repository for the paper "Mission: Impossible Language Models."β54Updated 2 months ago
- β100Updated last year
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learnersβ116Updated 5 months ago
- Discovering Data-driven Hypotheses in the Wildβ118Updated 5 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"β97Updated 2 years ago
- β111Updated 9 months ago
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Impliciβ¦β106Updated 2 years ago
- β49Updated 2 years ago
- The evaluation pipeline for the 2024 BabyLM Challenge.β33Updated last year
- β36Updated 4 months ago
- β212Updated 2 years ago
- The official code of LM-Debugger, an interactive tool for inspection and intervention in transformer-based language models.β180Updated 3 years ago
- Super fast implementations of common benchmark text world gamesβ51Updated 3 months ago
- Inspecting and Editing Knowledge Representations in Language Modelsβ119Updated 2 years ago
- β29Updated 10 months ago
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Setsβ218Updated last year
- β36Updated 2 years ago
- Aligning AI With Shared Human Values (ICLR 2021)β304Updated 2 years ago
- β141Updated 3 years ago
- The Prism Alignment Projectβ86Updated last year