jprivera44 / EscalAItion
Repo for the paper on Escalation Risks of AI systems
☆38Updated last year
Alternatives and similar repositories for EscalAItion:
Users that are interested in EscalAItion are comparing it to the libraries listed below
- ☆54Updated 6 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆74Updated last year
- An OpenAI gym environment to evaluate the ability of LLMs (eg. GPT-4, Claude) in long-horizon reasoning and task planning in dynamic mult…☆68Updated last year
- An attribution library for LLMs☆38Updated 7 months ago
- A benchmark for evaluating learning agents based on just language feedback☆73Updated 3 weeks ago
- Learning to route instances for Human vs AI Feedback☆23Updated 2 months ago
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆54Updated last month
- Interpreting how transformers simulate agents performing RL tasks☆79Updated last year
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆45Updated last month
- Measuring the situational awareness of language models☆34Updated last year
- ☆17Updated 6 months ago
- Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL☆48Updated 6 months ago
- ☆18Updated 9 months ago
- General-Sum variant of the game Diplomacy for evaluating AIs.☆28Updated last year
- Demo of using ChatGPT API for language learning☆12Updated 2 years ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated 10 months ago
- Repo to reproduce the First-Explore paper results☆37Updated 3 months ago
- A repository of projects and datasets under active development by Alignment Lab AI☆22Updated last year
- A dataset of alignment research and code to reproduce it☆77Updated last year
- A repository for transformer critique learning and generation☆89Updated last year
- ☆48Updated 5 months ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆69Updated last year
- ☆72Updated 2 months ago
- Functional Benchmarks and the Reasoning Gap☆85Updated 6 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 9 months ago
- Exploitability calculation for imperfect-information game benchmarks☆24Updated 2 weeks ago
- ☆68Updated last year
- ☆132Updated 5 months ago
- ☆81Updated last year
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆51Updated last month