Evaluation package that allows benchmarking of agentic AIs from various sources and frameworks by producing statistical results which can be compared across different use cases and datasets.
☆75May 19, 2026Updated 3 weeks ago
Alternatives and similar repositories for agent-quality-inspect
Users that are interested in agent-quality-inspect are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Jun 17, 2023Updated 2 years ago
- An open taxonomy and scoring framework for evaluating AI agent sandboxes: 7 defense layers, 7 threat categories, 3 evaluation dimensions,…☆78May 19, 2026Updated 3 weeks ago
- A lightweight annotation standard that helps AI agents navigate codebases faster, with fewer file reads and tool calls☆143May 2, 2026Updated last month
- A universal plugin framework for development tools that enables seamless browser-server communication and MCP (Model Context Protocol) in…☆33Apr 27, 2026Updated last month
- The official implementation of the paper "AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?"☆60May 19, 2026Updated 3 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- NTU Learn Downloader☆18Nov 10, 2021Updated 4 years ago
- Open-source firewall for AI agents. Policy engine that audits and controls what OpenClaw, Claude Code, Cursor, Codex, and any AI tool can…☆72Updated this week
- Cognithor · Agent OS: Local-first autonomous agent operating system. 19 LLM providers, 18 channels, 145 MCP tools, 6-tier memory, Agent P…☆147Updated this week
- MCP server that bridges clients to a real browser through CDP and a companion extension.☆239Updated this week
- [ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.☆60Jul 21, 2025Updated 10 months ago
- System audio capture + multi-provider ASR + local-first AI review workspace. Floating live captions, 12 ASR backends, 60+ languages, AI s…☆226Jun 3, 2026Updated last week
- A Model Context Protocol (MCP) server for Langfuse, enabling AI agents to query Langfuse trace data for enhanced debugging and observabil…☆96Jun 6, 2026Updated last week
- A practical AI agents handbook covering agent systems, agentic workflows, LangGraph, MCP/A2A, context engineering, agent memory, evaluati…☆314Updated this week
- MCP Server with TMDB☆75Jun 7, 2026Updated last week
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Autonomous orchestration framework for Claude Code with MemPalace-inspired memory (4-layer stack, 818-token wake-up), parallel-first Agen…☆138Jun 7, 2026Updated last week
- SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents☆85May 13, 2026Updated last month
- Alpha Vantage MCP Server☆166May 29, 2026Updated 2 weeks ago
- Go framework for agentic AI app with MCP and built-in tools☆188Jun 6, 2026Updated last week
- The Google Ads MCP Server is an implementation of the Model Context Protocol (MCP) that enables Large Language Models (LLMs), such as Gem…☆224Jun 4, 2026Updated last week
- The Rust SDK for building coding agents. Tool execution, LLM streaming, graph memory, sub-agent orchestration, MCP — as composable librar…☆377Jun 5, 2026Updated last week
- Zero instrucment system-level AI agent tracing in eBPF☆430Updated this week
- 🦞 Official plugin for OpenClaw that exports agent traces to Opik. See and monitor agent behaviour, cost, tokens, errors and more.☆619Updated this week
- Generate and evaluate agent skills for code agents like Claude Code, Open Code, OpenAI Codex☆680May 26, 2026Updated 2 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards☆642May 29, 2026Updated 2 weeks ago
- This solution accelerator leverages Microsoft Foundry, Azure Content Understanding, Azure OpenAI Service, and Foundry IQ to enable organi…☆453Updated this week
- Vibe-Skills is an all-in-one AI skills package. It seamlessly integrates expert-level capabilities and context management into a general-…☆2,261May 19, 2026Updated 3 weeks ago
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆925Jan 16, 2025Updated last year
- Model Context Protocol (MCP) server for Kubernetes and OpenShift☆1,679Updated this week
- The missing DevTools for Claude Code — inspect session logs, tool calls, token usage, subagents, and context window in a visual UI. Free,…☆3,557May 13, 2026Updated last month
- Modular Python framework for AI agents and workflows with chain-of-thought reasoning, tools, and memory.☆2,539Updated this week
- The platform for LLM evaluations and AI agent testing☆3,300Updated this week
- Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web. Make your own persistent autonom…☆4,321Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A curated list of awesome skills, hooks, slash-commands, agent orchestrators, applications, and plugins for Claude Code by Anthropic☆46,185Apr 27, 2026Updated last month
- Automate the process of making money online.☆30,883May 15, 2026Updated 3 weeks ago
- Robust Speech Recognition via Large-Scale Weak Supervision☆102,585Apr 15, 2026Updated last month
- 21 Lessons, Get Started Building with Generative AI☆111,819Updated this week