jprivera44 / EscalAItionLinks

Repo for the paper on Escalation Risks of AI systems

☆40

Alternatives and similar repositories for EscalAItion

Users that are interested in EscalAItion are comparing it to the libraries listed below

Sorting:

BladeTransformerLLC / OvercookedGPT
An OpenAI gym environment to evaluate the ability of LLMs (eg. GPT-4, Claude) in long-horizon reasoning and task planning in dynamic mult…
☆69Updated 2 years ago
google-deepmind / dangerous-capability-evaluations
☆55Updated 9 months ago
locross93 / Hypothetical-Minds
Hypothetical Minds is an autonomous LLM-based agent for diverse multi-agent settings, integrating a Theory of Mind module Theory of Mind …
☆31Updated 11 months ago
allenai / hybrid-preferences
Learning to route instances for Human vs AI Feedback (ACL 2025 Main)
☆23Updated last month
mukobi / welfare-diplomacy
General-Sum variant of the game Diplomacy for evaluating AIs.
☆29Updated last year
leap-laboratories / PIZZA
An attribution library for LLMs
☆41Updated 9 months ago
conglu1997 / intelligent-go-explore
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
☆58Updated 4 months ago
conglu1997 / ACD
Automated Capability Discovery via Foundation Model Self-Exploration
☆55Updated 4 months ago
arcee-ai / DAM
☆51Updated 7 months ago
jinhaoduan / GTBench
[NeurIPS 2024] GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations
☆63Updated 9 months ago
Alignment-Lab-AI / KnowledgeBase
never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…
☆37Updated last year
microsoft / LLF-Bench
A benchmark for evaluating learning agents based on just language feedback
☆81Updated 2 weeks ago
kevincure / FrancAIs
Demo of using ChatGPT API for language learning
☆12Updated 2 years ago
benediktstroebl / agent-evals
☆21Updated last month
allenai / clin
☆82Updated last year
aypan17 / machiavelli
☆134Updated 7 months ago
agiresearch / Formal-LLM
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
☆124Updated last year
haizelabs / sphynx
Sphynx Hallucination Induction
☆54Updated 4 months ago
poking-agents / modular-public
☆22Updated 3 weeks ago
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆71Updated this week
RewardReports / reward-reports
Documentation for dynamic machine learning systems.
☆29Updated 9 months ago
AlignmentResearch / learned-planner
Interpretability tools for recurrent convolutional networks (DRC) that play Sokoban
☆13Updated last week
btnorman / First-Explore
Repo to reproduce the First-Explore paper results
☆37Updated 6 months ago
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆70Updated last year
Wenyueh / game_theory
How to create rational LLM-based agents? Using game-theoretic workflows!
☆72Updated 3 weeks ago
AsaCooperStickland / situational-awareness-evals
Measuring the situational awareness of language models
☆35Updated last year
mklissa / maestromotif
Skill Design From AI Feedback
☆30Updated 4 months ago
callummcdougall / sae_visualizer
☆28Updated last year
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 4 months ago
METR / public-tasks
☆98Updated 3 months ago