jprivera44 / EscalAItionLinks
Repo for the paper on Escalation Risks of AI systems
☆43Updated last year
Alternatives and similar repositories for EscalAItion
Users that are interested in EscalAItion are comparing it to the libraries listed below
Sorting:
- ☆57Updated last month
- ☆20Updated last year
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆63Updated 6 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆127Updated last year
- ☆138Updated last month
- ☆101Updated 5 months ago
- ☆84Updated last year
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆54Updated 6 months ago
- A benchmark for evaluating learning agents based on just language feedback☆87Updated 2 months ago
- Automated Capability Discovery via Foundation Model Self-Exploration☆63Updated 6 months ago
- ☆85Updated last month
- A virtual environment for developing and evaluating automated scientific discovery agents.☆181Updated 5 months ago
- [NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking☆270Updated last year
- ☆73Updated 4 months ago
- An attribution library for LLMs☆42Updated 11 months ago
- General-Sum variant of the game Diplomacy for evaluating AIs.☆29Updated last year
- ☆122Updated 3 weeks ago
- How to create rational LLM-based agents? Using game-theoretic workflows!☆75Updated 2 months ago
- Governance of the Commons Simulation (GovSim)☆57Updated 7 months ago
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆140Updated last year
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆112Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆89Updated last year
- ☆295Updated last year
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆59Updated 8 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆150Updated 7 months ago
- ☆98Updated 4 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆198Updated 9 months ago
- ☆52Updated last year
- ☆26Updated 3 months ago
- A repo to evaluate various LLM's chess playing abilities.☆83Updated last year