AndonLabs / multiagent-inspectLinks
☆19Updated last year
Alternatives and similar repositories for multiagent-inspect
Users that are interested in multiagent-inspect are comparing it to the libraries listed below
Sorting:
- A framework for optimizing DSPy programs with RL☆308Updated 3 weeks ago
- Inference-time scaling for LLMs-as-a-judge.☆328Updated 3 months ago
- Prompt engineering, automated.☆352Updated 9 months ago
- Prompts used in the Automated Auditing Blog Post☆137Updated 6 months ago
- Lightly-reviewed collection of community environments☆210Updated last week
- A fully customizable and self-hosted sandboxing solution for AI agent code execution and computer use. It features out-of-the-box support…☆755Updated 8 months ago
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆853Updated this week
- ☆313Updated last month
- Harbor is a framework for running agent evaluations and creating and using RL environments.☆542Updated this week
- ☆140Updated 11 months ago
- rl from zero pretrain, can it be done? yes.☆286Updated 4 months ago
- A cache for AI agents to learn and replay complex behaviors.☆757Updated 7 months ago
- Testing baseline LLMs performance across various models☆336Updated this week
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆458Updated last year
- ☆53Updated this week
- ⚖️ Awesome LLM Judges ⚖️☆161Updated 9 months ago
- Coding problems used in aider's polyglot benchmark☆199Updated last year
- ☆20Updated 8 months ago
- Claude Deep Research config for Claude Code.☆226Updated 10 months ago
- The State Of The Art, intelligence☆157Updated 5 months ago
- Deep Research for your internal data☆351Updated 8 months ago
- Memory library for building stateful agents☆342Updated this week
- ☆624Updated 5 months ago
- AutoEvals is a tool for quickly and easily evaluating AI model outputs using best practices.☆802Updated this week
- Sandboxed code execution for AI agents, locally or on the cloud. Massively parallel, easy to extend. Powering SWE-agent and more.☆430Updated this week
- Weaving prompts and code into structured, resilient patterns that won't unravel under pressure.☆30Updated 2 months ago
- Build your own visual reasoning model☆418Updated 3 weeks ago
- GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's T…☆344Updated 5 months ago
- ☆223Updated this week
- Cloudstate is a JavaScript database runtime.☆207Updated 7 months ago