Evaluation package that allows benchmarking of agentic AIs from various sources and frameworks by producing statistical results which can be compared across different use cases and datasets.
☆77Jun 26, 2026Updated last week
Alternatives and similar repositories for agent-quality-inspect
Users that are interested in agent-quality-inspect are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Jun 17, 2023Updated 3 years ago
- An open taxonomy and scoring framework for evaluating AI agent sandboxes: 7 defense layers, 7 threat categories, 3 evaluation dimensions,…☆84Jun 10, 2026Updated 3 weeks ago
- A lightweight annotation standard that helps AI agents navigate codebases faster, with fewer file reads and tool calls☆142May 2, 2026Updated 2 months ago
- A universal plugin framework for development tools that enables seamless browser-server communication and MCP (Model Context Protocol) in…☆33Apr 27, 2026Updated 2 months ago
- The official implementation of the paper "AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?"☆64May 19, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- NTU Learn Downloader☆18Nov 10, 2021Updated 4 years ago
- Open-source firewall for AI agents. Policy engine that audits and controls what OpenClaw, Claude Code, Cursor, Codex, and any AI tool can…☆73Jun 17, 2026Updated 2 weeks ago
- Cognithor · Agent OS: Local-first autonomous agent operating system. 19 LLM providers, 18 channels, 145 MCP tools, 6-tier memory, Agent P…☆150Updated this week
- MCP server that bridges clients to a real browser through CDP and a companion extension.☆263Jun 22, 2026Updated last week
- [ACL25' Findings] SWE-Dev is an SWE agent with a scalable test case construction pipeline.☆61Jul 21, 2025Updated 11 months ago
- System audio capture + multi-provider ASR + local-first AI review workspace. Floating live captions, 12 ASR backends, 60+ languages, AI s…☆240Jun 3, 2026Updated last month
- A Model Context Protocol (MCP) server for Langfuse, enabling AI agents to query Langfuse trace data for enhanced debugging and observabil…☆97Updated this week
- MCP Server with TMDB☆75Jun 7, 2026Updated 3 weeks ago
- A practical AI agents handbook covering agent systems, agentic workflows, LangGraph, MCP/A2A, context engineering, agent memory, evaluati…☆325Updated this week
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Autonomous orchestration framework for Claude Code with MemPalace-inspired memory (4-layer stack, 818-token wake-up), parallel-first Agen…☆139Jun 22, 2026Updated last week
- SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents☆85Jun 16, 2026Updated 2 weeks ago
- Alpha Vantage MCP Server☆178Jun 26, 2026Updated last week
- Go framework for agentic AI app with MCP and built-in tools☆190Jun 17, 2026Updated 2 weeks ago
- The Google Ads MCP Server is an implementation of the Model Context Protocol (MCP) that enables Large Language Models (LLMs), such as Gem…☆233Jun 22, 2026Updated last week
- The Rust SDK for building coding agents. Tool execution, LLM streaming, graph memory, sub-agent orchestration, MCP — as composable librar…☆390Jun 23, 2026Updated last week
- System-level AI agent profling/tracing tool and skills in eBPF☆488Jun 26, 2026Updated last week
- 🦞 Official plugin for OpenClaw that exports agent traces to Opik. See and monitor agent behaviour, cost, tokens, errors and more.☆631Jun 22, 2026Updated last week
- Generate and evaluate agent skills for code agents like Claude Code, Open Code, OpenAI Codex☆698May 26, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards☆689Jun 17, 2026Updated 2 weeks ago
- This solution accelerator leverages Microsoft Foundry, Azure Content Understanding, Azure OpenAI Service, and Foundry IQ to enable organi…☆458Updated this week
- Vibe-Skills is an all-in-one AI skills package. It seamlessly integrates expert-level capabilities and context management into a general-…☆2,349May 19, 2026Updated last month
- TruthfulQA: Measuring How Models Imitate Human Falsehoods☆929Jan 16, 2025Updated last year
- Model Context Protocol (MCP) server for Kubernetes and OpenShift☆1,721Jun 26, 2026Updated last week
- The missing DevTools for Claude Code — inspect session logs, tool calls, token usage, subagents, and context window in a visual UI. Free,…☆3,616May 13, 2026Updated last month
- Modular Python framework for AI agents and workflows with chain-of-thought reasoning, tools, and memory.☆2,549Updated this week
- The platform for LLM evaluations and AI agent testing☆3,313Jun 26, 2026Updated last week
- Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web. Make your own persistent autonom…☆4,343Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A curated list of awesome skills, hooks, slash-commands, agent orchestrators, applications, and plugins for Claude Code by Anthropic☆47,680Updated this week
- Automate the process of making money online.☆31,058Jun 14, 2026Updated 2 weeks ago
- Robust Speech Recognition via Large-Scale Weak Supervision☆103,646Apr 15, 2026Updated 2 months ago
- 21 Lessons, Get Started Building with Generative AI☆112,420Jun 25, 2026Updated last week