anthropic-experimental / agentic-misalignmentLinks
☆340Updated last month
Alternatives and similar repositories for agentic-misalignment
Users that are interested in agentic-misalignment are comparing it to the libraries listed below
Sorting:
- ☆182Updated 4 months ago
- Inference-time scaling for LLMs-as-a-judge.☆267Updated 3 weeks ago
- An agent benchmark with tasks in a simulated software company.☆515Updated last week
- Prompts used in the Automated Auditing Blog Post☆74Updated last week
- Red-Teaming Language Models with DSPy☆203Updated 5 months ago
- A Text-Based Environment for Interactive Debugging☆250Updated this week
- Open source interpretability artefacts for R1.☆157Updated 3 months ago
- ⚖️ Awesome LLM Judges ⚖️☆108Updated 3 months ago
- Testing baseline LLMs performance across various models☆291Updated last week
- Collection of evals for Inspect AI☆198Updated this week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆103Updated last week
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆423Updated this week
- open source interpretability platform 🧠☆311Updated this week
- ☆102Updated this week
- ☆130Updated 4 months ago
- ☆24Updated 9 months ago
- Scaling Data for SWE-agents☆328Updated this week
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆573Updated this week
- ☆455Updated 2 weeks ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆47Updated 4 months ago
- Letting Claude Code develop his own MCP tools :)☆122Updated 4 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆448Updated 10 months ago
- A benchmark for LLMs on complicated tasks in the terminal☆322Updated this week
- Public repository containing METR's DVC pipeline for eval data analysis☆86Updated 4 months ago
- ☆138Updated 7 months ago
- Sphynx Hallucination Induction☆53Updated 6 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆99Updated 3 months ago
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆273Updated last week
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆72Updated 4 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆324Updated last month