Accenture/mcp-bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Accenture/mcp-bench)

Accenture / mcp-bench

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

☆496

Alternatives and similar repositories for mcp-bench

Users that are interested in mcp-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SalesforceAIResearch / MCP-Universe
View on GitHub
MCP-Universe is a comprehensive framework designed for RL training, benchmarking, and developing AI agents for general tool-use.
☆592Jun 23, 2026Updated last month
icip-cas / LiveMCPBench
View on GitHub
LiveMCPBench is a benchmark for evaluating the ability of agents to navigate and utilize a large-scale MCP toolset. It provides a compreh…
☆104Dec 18, 2025Updated 7 months ago
Coral-Protocol / Anemoi
View on GitHub
Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol
☆370Aug 27, 2025Updated 11 months ago
modelscope / MCPBench
View on GitHub
The evaluation benchmark on MCP servers
☆251Sep 3, 2025Updated 10 months ago
mcp-tool-bench / MCPToolBenchPP
View on GitHub
MCPToolBench++ MCP Model Context Protocol Tool Use Benchmark on AI Agent and Model Tool Use Ability
☆44Mar 17, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
eval-sys / mcpmark
View on GitHub
MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
☆453Jun 12, 2026Updated last month
google-deepmind / limit
View on GitHub
On the Theoretical Limitations of Embedding-Based Retrieval
☆651Sep 15, 2025Updated 10 months ago
scaleapi / mcp-atlas
View on GitHub
MCP Atlas
☆130Updated this week
webup / langgraph-up-react
View on GitHub
LangGraph template for a simple ReAct agent, with MCP tools support and robust test suites.
☆564Sep 28, 2025Updated 10 months ago
Danau5tin / multi-agent-coding-system
View on GitHub
Reached #13 on Stanford's Terminal Bench leaderboard. Orchestrator, explorer & coder agents working together with intelligent context sha…
☆1,414Nov 3, 2025Updated 8 months ago
code-423n4 / 2025-10-hybra-finance
View on GitHub
☆37Jan 16, 2026Updated 6 months ago
xjzzzzzzzz / MCPSafety
View on GitHub
☆22Dec 18, 2025Updated 7 months ago
comfy-deploy / comfydeploy
View on GitHub
ComfyDeployed
☆450Sep 19, 2025Updated 10 months ago
TencentCloudADP / youtu-agent
View on GitHub
A simple yet powerful agent framework that delivers with open-source models
☆4,583Mar 21, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
bytedance / USO
View on GitHub
[CVPR 2026] 🔥🔥 Official Repo of USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
☆1,229Sep 12, 2025Updated 10 months ago
PurCL / ProSec
View on GitHub
Official repo for "ProSec: Fortifying Code LLMs with Proactive Security Alignment"
☆18Feb 26, 2026Updated 5 months ago
googleapis / gcloud-mcp
View on GitHub
gcloud MCP server
☆870Updated this week
docker / docker-agent
View on GitHub
AI Agent Builder and Runtime by Docker Engineering
☆3,219Updated this week
hkust-nlp / Toolathlon
View on GitHub
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
☆441Updated this week
SalesforceAIResearch / MCPEval
View on GitHub
MCP-based Agent Deep Evaluation System
☆155Jun 2, 2026Updated last month
ltzheng / SimpleTIR
View on GitHub
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆401Mar 30, 2026Updated 3 months ago
guanyilin428 / Dynamic-Speculative-Planning
View on GitHub
☆48Sep 13, 2025Updated 10 months ago
zoecarver / saturn-arc
View on GitHub
☆27Aug 16, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
rasbt / reasoning-from-scratch
View on GitHub
Implement a reasoning LLM in PyTorch from scratch, step by step
☆4,829Updated this week
evalops / cognitive-dissonance-dspy
View on GitHub
A multi-agent LLM system for detecting and resolving cognitive dissonance.
☆282Apr 25, 2026Updated 3 months ago
GAIR-NLP / LIMI
View on GitHub
LIMI: Less is More for Agency
☆162Oct 14, 2025Updated 9 months ago
oxbshw / LLM-Agents-Ecosystem-Handbook
View on GitHub
One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation t…
☆538Jun 30, 2026Updated 3 weeks ago
THUNLP-MT / StableToolBench
View on GitHub
A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
☆238Apr 15, 2025Updated last year
TheAgentArk / Toucan
View on GitHub
Official repo of Toucan: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
☆260Dec 16, 2025Updated 7 months ago
ByteDance-Seed / seed-oss
View on GitHub
☆888Sep 15, 2025Updated 10 months ago
firecrawl / open-agent-builder
View on GitHub
🔥 Visual workflow builder for AI agents powered by Firecrawl - drag-and-drop web scraping pipelines with real-time execution
☆2,550Oct 20, 2025Updated 9 months ago
open-compass / GTA
View on GitHub
[NeurIPS 2024 D&B] GTA: A Benchmark for General Tool Agents & [arXiv 2026] GTA-2
☆148Apr 20, 2026Updated 3 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
WujiangXu / EPO
View on GitHub
The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"
☆40Jul 13, 2026Updated 2 weeks ago
ethz-spylab / agentdojo
View on GitHub
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
☆690Jun 2, 2026Updated last month
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆984Jul 4, 2026Updated 3 weeks ago
vllm-project / semantic-router
View on GitHub
Intelligent Mixture-of-Models Router for Efficient Heterogeneous LLMs Inference
☆5,068Updated this week
sierra-research / tau2-bench
View on GitHub
τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
☆1,685Updated this week
clearloveclearlove / BEAT
View on GitHub
☆15Feb 26, 2025Updated last year
QwenLM / Qwen3-ASR-Toolkit
View on GitHub
Official Python toolkit for the Qwen3-ASR API. Parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support…
☆983Feb 5, 2026Updated 5 months ago