aypan17 / machiavelli
☆128Updated 3 months ago
Alternatives and similar repositories for machiavelli:
Users that are interested in machiavelli are comparing it to the libraries listed below
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆71Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆98Updated 10 months ago
- General-Sum variant of the game Diplomacy for evaluating AIs.☆29Updated 10 months ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆88Updated last year
- A library for efficient patching and automatic circuit discovery.☆53Updated this week
- Steering vectors for transformer language models in Pytorch / Huggingface☆88Updated this week
- Steering Llama 2 with Contrastive Activation Addition☆123Updated 8 months ago
- ☆253Updated 7 months ago
- ☆203Updated 4 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆64Updated 8 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆137Updated 2 weeks ago
- ☆55Updated 3 months ago
- Algebraic value editing in pretrained language models☆62Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆112Updated last year
- A repository for transformer critique learning and generation☆88Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆208Updated 9 months ago
- ☆151Updated this week
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆93Updated 11 months ago
- ☆76Updated 6 months ago
- ☆109Updated 6 months ago
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.☆242Updated 4 months ago
- ☆78Updated 7 months ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆195Updated last week
- Improving Alignment and Robustness with Circuit Breakers☆185Updated 4 months ago
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.☆135Updated 4 months ago
- Measuring the situational awareness of language models☆34Updated last year
- ☆61Updated 3 weeks ago
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.☆92Updated this week
- ☆52Updated this week
- Can Language Models Solve Olympiad Programming?☆110Updated last month