anthropic-experimental / agentic-misalignmentLinks
☆261Updated 3 weeks ago
Alternatives and similar repositories for agentic-misalignment
Users that are interested in agentic-misalignment are comparing it to the libraries listed below
Sorting:
- Inference-time scaling for LLMs-as-a-judge.☆250Updated last week
- Testing baseline LLMs performance across various models☆284Updated this week
- ⚖️ Awesome LLM Judges ⚖️☆107Updated 2 months ago
- A Text-Based Environment for Interactive Debugging☆234Updated this week
- ☆168Updated 4 months ago
- Letting Claude Code develop his own MCP tools :)☆114Updated 4 months ago
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆535Updated this week
- ☆128Updated 3 months ago
- An agent benchmark with tasks in a simulated software company.☆488Updated last week
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆447Updated 9 months ago
- Red-Teaming Language Models with DSPy☆202Updated 5 months ago
- A Tree Search Library with Flexible API for LLM Inference-Time Scaling☆383Updated this week
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆242Updated last week
- Open source interpretability artefacts for R1.☆154Updated 2 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆99Updated 2 weeks ago
- llm-consortium orchestrates mulitple LLMs, iteratively refines & achieves consensus.☆250Updated last week
- Claude Deep Research config for Claude Code.☆193Updated 3 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆92Updated 3 months ago
- Open-source resources on agents for computer use.☆354Updated 5 months ago
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆306Updated 2 weeks ago
- ☆132Updated 6 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆71Updated 3 months ago
- Together Open Deep Research☆320Updated 2 months ago
- Scaling Data for SWE-agents☆293Updated this week
- II-Researcher: a new open-source framework designed to aid building search / research agents☆425Updated 2 weeks ago
- Build hours code to share.☆379Updated 3 weeks ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆295Updated 3 weeks ago
- Public repository containing METR's DVC pipeline for eval data analysis☆78Updated 3 months ago
- An MCP Server that's also an MCP Client. Useful for letting Claude develop and test MCPs without needing to reset the application.☆121Updated 4 months ago
- Collection of evals for Inspect AI☆178Updated this week