SAP/agent-quality-inspect

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SAP/agent-quality-inspect)

SAP / agent-quality-inspect

Evaluation package that allows benchmarking of agentic AIs from various sources and frameworks by producing statistical results which can be compared across different use cases and datasets.

☆78

Alternatives and similar repositories for agent-quality-inspect

Users that are interested in agent-quality-inspect are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

botanu-ai / botanu-sdk-python
View on GitHub
SDK to track cost-per-outcome for AI workflows
☆18Apr 25, 2026Updated 2 months ago
Zhengsh123 / PHYSICS
View on GitHub
Official GitHub repo for Scaling Physical Reasoning with the PHYSICS Dataset (NeurIPS25).
☆16Sep 20, 2025Updated 10 months ago
smigolsmigol / llmkit
View on GitHub
Know what your AI agents cost. API gateway with budget enforcement, session tracking, and MCP tools.
☆16Updated this week
IBM / mlflow-watsonml
View on GitHub
MLflow deployment plugin For IBM-cloud-watson-ml
☆15May 7, 2025Updated last year
SALT-NLP / CoAnnotating
View on GitHub
This is the official repository for "CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data An…
☆24Oct 26, 2023Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
foundation-model-stack / fms-hf-tuning
View on GitHub
🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
☆58Updated this week
wordbricks / onequery
View on GitHub
One interface for your whole data stack, with built-in safeguards and a simpler workflow for your team. Written in Rust. Docker not neede…
☆17Updated this week
meshkovQA / Eval-ai-library
View on GitHub
Comprehensive AI Model Evaluation Framework with advanced techniques including Temperature-Controlled Verdict Aggregation via Generalized…
☆44Jul 10, 2026Updated 2 weeks ago
The-AI-Alliance / cube-harness
View on GitHub
Drive OSS standards and tools for data curation and evaluation creation for state of the art AI agents
☆54Jul 17, 2026Updated last week
hkochar / openclaw-deck
View on GitHub
Self-hosted dashboard for OpenClaw AI agents. Cost tracking, budget enforcement, session replay, config safety.
☆16Mar 10, 2026Updated 4 months ago
loplop-h / spent
View on GitHub
Claude Code session cost tracker. Efficiency score, productive vs wasted breakdown, live terminal dashboard.
☆18Apr 4, 2026Updated 3 months ago
mcpchecker / mcpchecker
View on GitHub
☆23Updated this week
IBM / ReActXen
View on GitHub
This is a base-react agent for AssetOpsBench
☆25Jul 4, 2026Updated 2 weeks ago
SeldonIO / trtis-k8s-scheduler
View on GitHub
Custom Scheduler to deploy ML models to TRTIS for GPU Sharing
☆12Apr 1, 2020Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
sps014 / Adk-cs
View on GitHub
An open-source, code-first C# (Dotnet) toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and contr…
☆24Jul 11, 2026Updated last week
shengchaochen82 / FFTS
View on GitHub
[AAAI'25] The implementation of paper "Federated Foundation Models on Heterogeneous Time Series" | The first work to explore time series …
☆24May 10, 2026Updated 2 months ago
SilentFleetKK / computecfo
View on GitHub
🏦 ComputeCFO — Your AI Financial Officer. Track, analyze, and optimize LLM API spending. Budget controls, ROI analysis, cost prediction.
☆22Jul 9, 2026Updated 2 weeks ago
Lucew / changepoynt
View on GitHub
Efficient and readable change point detection package implemented in Python. (Singular Spectrum Transformation - SST, IKA-SST, ulSIF, RuL…
☆35Jun 19, 2026Updated last month
lalaliat / Agent-Oriented-Planning
View on GitHub
☆26Feb 28, 2025Updated last year
aws-samples / sample-genai-on-eks-starter-kit
View on GitHub
A comprehensive toolkit for deploying production-ready Generative AI infrastructure on Amazon EKS. Includes pre-configured components for…
☆86Updated this week
ricardoevvargas / awesome-industry40-datasets
View on GitHub
A curated list of public dataset related to Industry 4.0.
☆23Nov 4, 2022Updated 3 years ago
OpenFunction / functions-framework-go
View on GitHub
Go functions framework for OpenFunction
☆18Jun 17, 2024Updated 2 years ago
project-codeflare / codeflare-sdk
View on GitHub
An intuitive, easy-to-use python interface for batch resource requesting, access, job submission, and observation. Simplifying the develo…
☆35Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
liyecom / liye-ai
View on GitHub
Agent economic trading market infrastructure. A system used for managing Agents, discovering Agents, scheduling Agents, evaluating Agents…
☆33Jul 17, 2026Updated last week
samyama-ai / samyama-graph
View on GitHub
Graph-vector database that queried 1 billion edges for $2.50. Rust, OpenCypher, vector search, 14 graph algorithms. 74M nodes / 1B edges …
☆84Updated this week
arrismo / kaggle-mcp
View on GitHub
MCP server for Kaggle
☆39May 21, 2026Updated 2 months ago
avivsinai / langfuse-mcp
View on GitHub
A Model Context Protocol (MCP) server for Langfuse, enabling AI agents to query Langfuse trace data for enhanced debugging and observabil…
☆100Jun 27, 2026Updated 3 weeks ago
anadim / llm-benchmark-matrix
View on GitHub
Cited 83-model x 49-benchmark LLM evaluation matrix with 18 matrix completion methods
☆39Feb 25, 2026Updated 4 months ago
alex000kim / skypilot-code-sandbox
View on GitHub
A self-hosted, secure code execution sandbox for LLM agents deployed on your cloud infrastructure using SkyPilot. Built on llm-sandbox fo…
☆17Jul 20, 2025Updated last year
primeqa / clapnq
View on GitHub
☆46Jan 21, 2025Updated last year
Froot-NetSys / NetArena
View on GitHub
☆39Apr 6, 2026Updated 3 months ago
gardener / autoscaler
View on GitHub
Customised fork of cluster-autoscaler to support machine-controller-manager
☆17Jun 17, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
dapr / durabletask-go
View on GitHub
The Durable Task Framework is a lightweight, embeddable engine for writing durable, fault-tolerant business logic (orchestrations) as ord…
☆16Updated this week
briqt / agent-usage
View on GitHub
Lightweight cross-platform AI coding agent usage & cost tracker. Single binary, SQLite, web dashboard. | 轻量跨平台 AI 编程工具用量与费用追踪器，单二进制、SQLit…
☆27Jul 17, 2026Updated last week
GAIR-NLP / AlphaEval
View on GitHub
☆43May 4, 2026Updated 2 months ago
bastienjacquet / CudaDepthMapIntegration
View on GitHub
Depth map integration using VTK and Cuda
☆15Sep 11, 2017Updated 8 years ago
Red-Hat-AI-Innovation-Team / its_hub
View on GitHub
A Python library for inference-time scaling LLMs
☆36Updated this week
scivision / em-sfm
View on GitHub
Lorenzo Torresani's Structure from Motion Matlab code
☆13Aug 1, 2021Updated 4 years ago
JamesonRGrieve / ServerFramework
View on GitHub
Automatically generate database models, GraphQL schema, REST endpoints, MCPs, SDKs and tests from Pydantic models.
☆29May 13, 2026Updated 2 months ago