jprivera44 / EscalAItionLinks
Repo for the paper on Escalation Risks of AI systems
☆44Updated last year
Alternatives and similar repositories for EscalAItion
Users that are interested in EscalAItion are comparing it to the libraries listed below
Sorting:
- ☆116Updated last week
- A virtual environment for developing and evaluating automated scientific discovery agents.☆199Updated 10 months ago
- ☆65Updated 2 weeks ago
- Governance of the Commons Simulation (GovSim)☆64Updated last year
- ☆105Updated 5 months ago
- General-Sum variant of the game Diplomacy for evaluating AIs.☆34Updated last year
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆87Updated 11 months ago
- An OpenAI gym environment to evaluate the ability of LLMs (eg. GPT-4, Claude) in long-horizon reasoning and task planning in dynamic mult…☆73Updated 2 years ago
- Public repository containing METR's DVC pipeline for eval data analysis☆186Updated last week
- ☆144Updated 6 months ago
- An attribution library for LLMs☆46Updated last year
- ☆22Updated last year
- ☆86Updated 2 years ago
- Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).☆243Updated last month
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆59Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆152Updated 11 months ago
- ☆33Updated 7 months ago
- Automated Capability Discovery via Foundation Model Self-Exploration☆66Updated 11 months ago
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.☆332Updated last month
- Automating enterprise workflows with multimodal agents☆114Updated last year
- [NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking☆267Updated last year
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆82Updated last year
- A benchmark for evaluating learning agents based on just language feedback☆94Updated 7 months ago
- Open-ended wargames with large language models☆46Updated last week
- ☆57Updated last year
- ☆80Updated last year
- A repo to evaluate various LLM's chess playing abilities.☆87Updated last year
- Causal DAG Extraction from Text (DEFT)☆66Updated last year
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆132Updated last year
- Intrinsic Motivation from Artificial Intelligence Feedback☆135Updated 2 years ago