AndonLabs / multiagent-inspect
☆11Updated 2 months ago
Alternatives and similar repositories for multiagent-inspect:
Users that are interested in multiagent-inspect are comparing it to the libraries listed below
- Draw more samples☆189Updated 9 months ago
- Verdict is a library for scaling judge-time compute.☆195Updated 3 weeks ago
- Prompt engineering, automated.☆299Updated 3 weeks ago
- ⚖️ Awesome LLM Judges ⚖️☆90Updated last month
- ☆48Updated this week
- Claude Deep Research config for Claude Code.☆165Updated last month
- vscode extension to convert computationally intensive pytorch kernels to triton☆22Updated 6 months ago
- ☆71Updated 2 months ago
- Multi-language code navigation API in a container☆74Updated 3 weeks ago
- ☆107Updated 3 months ago
- Fine-tuning and serving LLMs on any cloud☆89Updated last year
- Sphynx Hallucination Induction☆53Updated 2 months ago
- 🚀 Easy, open-source LLM finetuning with one-line commands, seamless cloud integration, and popular optimization frameworks. ✨☆90Updated 8 months ago
- A fully customizable and self-hosted sandboxing solution for AI agent code execution and computer use. It features out-of-the-box support…☆298Updated this week
- Letting Claude Code develop his own MCP tools :)☆97Updated last month
- An Open-Source AI Writing Project.☆133Updated this week
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆290Updated this week
- Gumloop Unified Model Context Protocol (guMCP)☆324Updated this week
- Cloudstate is a JavaScript database runtime.☆175Updated last month
- Foyle is a copilot to help developers deploy and operate their applications.☆125Updated last month
- Find the samples, in the test data, on which your (generative) model makes mistakes.☆26Updated 6 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆429Updated 6 months ago
- Sister project to OpenLLMetry, but in Typescript. Open-source observability for your LLM application, based on OpenTelemetry☆307Updated 3 weeks ago
- An MCP Server that's also an MCP Client. Useful for letting Claude develop and test MCPs without needing to reset the application.☆113Updated last month
- Applying SAEs for fine-grained control☆17Updated 4 months ago
- Spongecake is the easiest way to launch computer use agents.☆124Updated this week
- A reading list of relevant papers and projects on foundation model annotation☆25Updated last month
- Enriched Python function call graphs for agents and coding assistants☆75Updated 3 weeks ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆89Updated this week
- Synthetic Data for LLM Fine-Tuning☆113Updated last year