aypan17 / machiavelliLinks

☆137

Alternatives and similar repositories for machiavelli

Users that are interested in machiavelli are comparing it to the libraries listed below

Sorting:

meg-tong / sycophancy-eval
datasets from the paper "Towards Understanding Sycophancy in Language Models"
☆85Updated last year
ucl-dark / llm_debate
Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
☆113Updated last year
abdulhaim / LMRL-Gym
☆99Updated last year
redwoodresearch / Easy-Transformer
☆121Updated 11 months ago
METR / RE-Bench
☆94Updated 3 months ago
mukobi / welfare-diplomacy
General-Sum variant of the game Diplomacy for evaluating AIs.
☆29Updated last year
EleutherAI / elk
Keeping language models honest by directly eliciting knowledge encoded in their activations.
☆207Updated last week
anthropics / sleeper-agents-paper
Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".
☆111Updated last year
allenai / ScienceWorld
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
☆278Updated 2 weeks ago
nrimsky / LM-exp
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆96Updated last year
likenneth / othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆184Updated 2 years ago
gabegrand / world-models
☆208Updated 2 years ago
LoryPack / LLM-LieDetector
Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"
☆71Updated last year
redwoodresearch / alignment_faking_public
☆69Updated 2 months ago
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆73Updated last week
ArthurConmy / Automatic-Circuit-Discovery
☆233Updated 10 months ago
steering-vectors / steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
☆119Updated 5 months ago
Sea-Snell / Implicit-Language-Q-Learning
Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆208Updated 2 years ago
asaparov / prontoqa
Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.
☆147Updated 9 months ago
ryoungj / ToolEmu
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
☆151Updated last year
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
anthropics / ConstitutionalHarmlessnessPaper
☆239Updated 2 years ago
google-deepmind / dangerous-capability-evaluations
☆55Updated 10 months ago
anthropics / evals
☆285Updated last year
princeton-nlp / USACO
Can Language Models Solve Olympiad Programming?
☆119Updated 6 months ago
montemac / activation_additions
Algebraic value editing in pretrained language models
☆65Updated last year
GFNOrg / gfn-lm-tuning
☆183Updated last year
jlin816 / dialop
DialOp: Decision-oriented dialogue environments for collaborative language agents
☆109Updated 8 months ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
HannahKirk / prism-alignment
The Prism Alignment Project
☆79Updated last year