jprivera44 / EscalAItionLinks
Repo for the paper on Escalation Risks of AI systems
☆44Updated last year
Alternatives and similar repositories for EscalAItion
Users that are interested in EscalAItion are comparing it to the libraries listed below
Sorting:
- ☆138Updated 2 months ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆55Updated 7 months ago
- ☆57Updated this week
- Governance of the Commons Simulation (GovSim)☆59Updated 8 months ago
- ☆97Updated last month
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆116Updated last year
- ☆20Updated last year
- ☆103Updated this week
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆128Updated last year
- An attribution library for LLMs☆42Updated last year
- A preprint version of our recent research on the capability of frontier AI systems to do self-replication☆59Updated 9 months ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆209Updated this week
- Repository for the paper Stream of Search: Learning to Search in Language☆151Updated 7 months ago
- ☆53Updated last year
- ☆300Updated last year
- General-Sum variant of the game Diplomacy for evaluating AIs.☆30Updated last year
- A virtual environment for developing and evaluating automated scientific discovery agents.☆185Updated 6 months ago
- A repo to evaluate various LLM's chess playing abilities.☆83Updated last year
- Collection of Tree of Thoughts prompting techniques I've found useful to start with, then stylize, then iterate☆92Updated last year
- Sphynx Hallucination Induction☆53Updated 7 months ago
- ☆133Updated this week
- A toolkit for describing model features and intervening on those features to steer behavior.☆202Updated 10 months ago
- A codebase for "Language Models can Solve Computer Tasks"☆235Updated last year
- [NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking☆270Updated last year
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆69Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆92Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆183Updated 6 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆89Updated 11 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆110Updated 9 months ago
- Use the OpenAI Batch tool to make async batch requests to the OpenAI API.☆100Updated last year