Evaluation package that allows benchmarking of agentic AIs from various sources and frameworks by producing statistical results which can be compared across different use cases and datasets.
☆75Apr 22, 2026Updated last month
Alternatives and similar repositories for agent-quality-inspect
Users that are interested in agent-quality-inspect are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Jun 17, 2023Updated 2 years ago
- An open taxonomy and scoring framework for evaluating AI agent sandboxes: 7 defense layers, 7 threat categories, 3 evaluation dimensions,…☆68Apr 14, 2026Updated last month
- A lightweight annotation standard that helps AI agents navigate codebases faster, with fewer file reads and tool calls☆137May 2, 2026Updated 3 weeks ago
- A universal plugin framework for development tools that enables seamless browser-server communication and MCP (Model Context Protocol) in…☆33Apr 27, 2026Updated 3 weeks ago
- The official implementation of the paper "AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?"☆54Updated this week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- NTU Learn Downloader☆18Nov 10, 2021Updated 4 years ago
- MCP server that bridges clients to a real browser through CDP and a companion extension.☆131May 17, 2026Updated last week
- Open-source firewall for AI agents. Policy engine that audits and controls what OpenClaw, Claude Code, Cursor, Codex, and any AI tool can…☆68May 12, 2026Updated last week
- Cognithor · Agent OS: Local-first autonomous agent operating system. 19 LLM providers, 18 channels, 145 MCP tools, 6-tier memory, Agent P…☆139Updated this week
- System audio capture + multi-provider ASR + local-first AI review workspace. Floating live captions, 12 ASR backends, 60+ languages, AI s…☆174Updated this week
- [ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.☆60Jul 21, 2025Updated 10 months ago
- A practical AI agents handbook covering agent systems, agentic workflows, LangGraph, MCP/A2A, context engineering, agent memory, evaluati…☆202May 17, 2026Updated last week
- A Model Context Protocol (MCP) server for Langfuse, enabling AI agents to query Langfuse trace data for enhanced debugging and observabil…☆88Updated this week
- MCP Server with TMDB☆73May 14, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Autonomous orchestration framework for Claude Code with MemPalace-inspired memory (4-layer stack, 818-token wake-up), parallel-first Agen…☆136Apr 20, 2026Updated last month
- SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents☆84May 13, 2026Updated last week
- Go framework for agentic AI app with MCP and built-in tools☆186Updated this week
- The Google Ads MCP Server is an implementation of the Model Context Protocol (MCP) that enables Large Language Models (LLMs), such as Gem…☆206Updated this week
- The Rust SDK for building coding agents. Tool execution, LLM streaming, graph memory, sub-agent orchestration, MCP — as composable librar…☆357Updated this week
- Zero instrucment LLM and AI agent (e.g. claude code, gemini-cli) observability in eBPF☆336Updated this week
- 🦞 Official plugin for OpenClaw that exports agent traces to Opik. See and monitor agent behaviour, cost, tokens, errors and more.☆615Updated this week
- Generate and evaluate agent skills for code agents like Claude Code, Open Code, OpenAI Codex☆627May 11, 2026Updated last week
- OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards☆619Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This solution accelerator leverages Microsoft Foundry, Azure Content Understanding, Azure OpenAI Service, and Foundry IQ to enable organi…☆446Updated this week
- Vibe-Skills is an all-in-one AI skills package. It seamlessly integrates expert-level capabilities and context management into a general-…☆2,102May 8, 2026Updated 2 weeks ago
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆919Jan 16, 2025Updated last year
- Model Context Protocol (MCP) server for Kubernetes and OpenShift☆1,609Updated this week
- The missing DevTools for Claude Code — inspect session logs, tool calls, token usage, subagents, and context window in a visual UI. Free,…☆3,429May 13, 2026Updated last week
- Modular Python framework for AI agents and workflows with chain-of-thought reasoning, tools, and memory.☆2,527May 13, 2026Updated last week
- The platform for LLM evaluations and AI agent testing☆3,265Updated this week
- Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web. Make your own persistent autonom…☆4,308Updated this week
- A curated list of awesome skills, hooks, slash-commands, agent orchestrators, applications, and plugins for Claude Code by Anthropic☆44,304Apr 27, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Automate the process of making money online.☆30,526May 15, 2026Updated last week
- Robust Speech Recognition via Large-Scale Weak Supervision☆99,901Apr 15, 2026Updated last month
- 21 Lessons, Get Started Building with Generative AI☆110,995May 14, 2026Updated last week