anthropic-experimental / agentic-misalignmentLinks
☆542Updated 6 months ago
Alternatives and similar repositories for agentic-misalignment
Users that are interested in agentic-misalignment are comparing it to the libraries listed below
Sorting:
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆776Updated 2 weeks ago
- ☆236Updated last month
- Prompts used in the Automated Auditing Blog Post☆128Updated 5 months ago
- Inference-time scaling for LLMs-as-a-judge.☆320Updated last month
- Open source interpretability artefacts for R1.☆165Updated 8 months ago
- Collection of evals for Inspect AI☆320Updated last week
- open source interpretability platform 🧠☆590Updated this week
- ☆481Updated 5 months ago
- Real-Time Detection of Hallucinated Entities in Long-Form Generation☆273Updated last month
- Testing baseline LLMs performance across various models☆330Updated last month
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆128Updated last month
- ☆208Updated this week
- ⚖️ Awesome LLM Judges ⚖️☆146Updated 8 months ago
- An interface library for RL post training with environments.☆944Updated this week
- Red-Teaming Language Models with DSPy☆248Updated 10 months ago
- Inference API for many LLMs and other useful tools for empirical research☆90Updated last week
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆780Updated this week
- A toolkit for describing model features and intervening on those features to steer behavior.☆224Updated 3 weeks ago
- Curated collection of community environments☆196Updated last week
- Provider-agnostic, open-source evaluation infrastructure for language models☆698Updated last week
- Public repository containing METR's DVC pipeline for eval data analysis☆168Updated 8 months ago
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆492Updated last week
- Super basic implementation (gist-like) of RLMs with REPL environments.☆290Updated 2 months ago
- This repository contains the toolkit for replicating results from our technical report.☆185Updated 3 months ago
- An agent benchmark with tasks in a simulated software company.☆617Updated last month
- Harbor is a framework for running agent evaluations and creating and using RL environments.☆241Updated last week
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆486Updated 4 months ago
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆144Updated 2 weeks ago
- ☆104Updated 5 months ago
- ☆309Updated 2 weeks ago