giorgiopiatti / GovSimLinks
Governance of the Commons Simulation (GovSim)
☆51Updated 5 months ago
Alternatives and similar repositories for GovSim
Users that are interested in GovSim are comparing it to the libraries listed below
Sorting:
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆109Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆95Updated 2 weeks ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆95Updated 2 weeks ago
- How to create rational LLM-based agents? Using game-theoretic workflows!☆72Updated 2 weeks ago
- ☆69Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆81Updated last year
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆148Updated 4 months ago
- [ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use☆145Updated last year
- Open source interpretability artefacts for R1.☆149Updated 2 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆70Updated last year
- ☆43Updated last month
- Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)☆224Updated last week
- ☆25Updated last year
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆65Updated last year
- [arXiv] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees☆21Updated 3 months ago
- A mechanistic approach for understanding and detecting factual errors of large language models.☆46Updated 11 months ago
- ☆50Updated 3 months ago
- ☆95Updated 11 months ago
- ☆27Updated 4 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆108Updated 4 months ago
- ☆133Updated 7 months ago
- [NeurIPS 2024] GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations☆62Updated 9 months ago
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆118Updated 2 weeks ago
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆37Updated 7 months ago
- ☆19Updated 11 months ago
- ☆114Updated 5 months ago
- ☆33Updated 4 months ago
- A resource repository for representation engineering in large language models☆126Updated 7 months ago
- [ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆65Updated this week
- Function Vectors in Large Language Models (ICLR 2024)☆170Updated 2 months ago