marcelbinz / CENTaUR
β23Updated 7 months ago
Alternatives and similar repositories for CENTaUR:
Users that are interested in CENTaUR are comparing it to the libraries listed below
- β76Updated 6 months ago
- β28Updated this week
- π» Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"β52Updated 7 months ago
- DialOp: Decision-oriented dialogue environments for collaborative language agentsβ103Updated 2 months ago
- β15Updated 6 months ago
- Governance of the Commons Simulation (GovSim)β31Updated 6 months ago
- β125Updated 2 months ago
- Hypothetical Minds is an autonomous LLM-based agent for diverse multi-agent settings, integrating a Theory of Mind module Theory of Mind β¦β23Updated 6 months ago
- β65Updated 9 months ago
- Super fast implementations of common benchmark text world gamesβ44Updated last month
- Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDLβ48Updated 3 months ago
- [EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Gamesβ68Updated 3 years ago
- A benchmark for evaluating learning agents based on just language feedbackβ61Updated 3 months ago
- Repository for the paper Stream of Search: Learning to Search in Languageβ119Updated 5 months ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgmentβ38Updated last year
- Evaluating the Moral Beliefs Encoded in LLMsβ23Updated last month
- Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22β65Updated 2 years ago
- β55Updated 2 months ago
- A repository for transformer critique learning and generationβ88Updated last year
- This repo contains code for our NeurIPS 2023 spotlight paper: Evaluating and Inducing Personality in Pre-trained Language Modelsβ49Updated last year
- β25Updated 4 months ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasksβ40Updated last month
- Composable inference algorithms with LLMs and programmable logicβ56Updated last month
- β20Updated 7 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffingβ28Updated 2 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"β66Updated last year
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.β105Updated 7 months ago
- β58Updated 2 years ago
- β123Updated 11 months ago
- β27Updated last year