anthropic-experimental / agentic-misalignmentLinks
☆494Updated 4 months ago
Alternatives and similar repositories for agentic-misalignment
Users that are interested in agentic-misalignment are comparing it to the libraries listed below
Sorting:
- ☆222Updated 7 months ago
- Prompts used in the Automated Auditing Blog Post☆114Updated 3 months ago
- open source interpretability platform 🧠☆455Updated 2 weeks ago
- An alignment auditing agent capable of quickly exploring alignment hypothesis☆626Updated this week
- Collection of evals for Inspect AI☆264Updated last week
- Open source interpretability artefacts for R1.☆163Updated 6 months ago
- Inference-time scaling for LLMs-as-a-judge.☆304Updated last month
- ☆476Updated 3 months ago
- ⚖️ Awesome LLM Judges ⚖️☆132Updated 6 months ago
- ☆172Updated last week
- Real-Time Detection of Hallucinated Entities in Long-Form Generation☆262Updated 2 weeks ago
- Testing baseline LLMs performance across various models☆319Updated 3 weeks ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆118Updated last week
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆733Updated this week
- ☆135Updated 7 months ago
- Red-Teaming Language Models with DSPy☆235Updated 8 months ago
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆432Updated last week
- An agent benchmark with tasks in a simulated software company.☆570Updated 2 weeks ago
- ☆170Updated 10 months ago
- Training-Ready RL Environments + Evals☆132Updated last week
- A toolkit for describing model features and intervening on those features to steer behavior.☆209Updated 11 months ago
- This repository contains the toolkit for replicating results from our technical report.☆152Updated last month
- Public repository containing METR's DVC pipeline for eval data analysis☆124Updated 6 months ago
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆479Updated last week
- A benchmark for LLMs on complicated tasks in the terminal☆961Updated last week
- Frontier Models playing the board game Diplomacy.☆597Updated last month
- Post-training with Tinker☆1,096Updated last week
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆463Updated 2 months ago
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models☆269Updated 3 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆349Updated this week