modelscope/MCPBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/modelscope/MCPBench)

modelscope / MCPBench

The evaluation benchmark on MCP servers

☆249

Alternatives and similar repositories for MCPBench

Users that are interested in MCPBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

modelscope / mcp-central
View on GitHub
Collection of model-centric MCP servers
☆27May 21, 2025Updated last year
modelscope / twinkle
View on GitHub
Twinkle✨: Training workbench to make your model glow.
☆248Updated this week
Accenture / mcp-bench
View on GitHub
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
☆492Oct 7, 2025Updated 9 months ago
XGenerationLab / XiYan-DateResolver
View on GitHub
a date understanding and reasoning enhanced model
☆55Sep 3, 2025Updated 10 months ago
XGenerationLab / XiYanSQL-QwenCoder
View on GitHub
XiYanSQL models for Text-to-SQL.
☆155Sep 3, 2025Updated 10 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
cxcscmu / deepresearch_benchmarking
View on GitHub
☆29Mar 10, 2026Updated 4 months ago
xfey / MCP-Zero
View on GitHub
MCP-Zero: Active Tool Discovery for Autonomous LLM Agents
☆500Jul 2, 2025Updated last year
modelscope / DiffSynth-Engine
View on GitHub
☆425Jul 8, 2026Updated last week
modelscope / flowra
View on GitHub
☆71Nov 24, 2025Updated 7 months ago
XGenerationLab / xiyan_mcp_server
View on GitHub
A Model Context Protocol (MCP) server that enables natural language queries to databases
☆240Feb 11, 2026Updated 5 months ago
MTU-Bench-Team / MTU-Bench
View on GitHub
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
☆60Jul 24, 2025Updated 11 months ago
icip-cas / LiveMCPBench
View on GitHub
LiveMCPBench is a benchmark for evaluating the ability of agents to navigate and utilize a large-scale MCP toolset. It provides a compreh…
☆103Dec 18, 2025Updated 7 months ago
Leezekun / SOPBench
View on GitHub
The data and code for paper: "SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints"
☆16Nov 17, 2025Updated 8 months ago
eval-sys / mcpmark
View on GitHub
MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
☆449Jun 12, 2026Updated last month
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
OpenHands / agent-analysis
View on GitHub
A collection of scripts and tools for analyzing SWE agents.
☆16May 7, 2025Updated last year
StonyBrookNLP / appworld
View on GitHub
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource…
☆464Feb 17, 2026Updated 5 months ago
mcp-tool-bench / MCPToolBenchPP
View on GitHub
MCPToolBench++ MCP Model Context Protocol Tool Use Benchmark on AI Agent and Model Tool Use Ability
☆44Mar 17, 2026Updated 4 months ago
axon-rl / gem
View on GitHub
A Gym for Agentic LLMs
☆502Jan 21, 2026Updated 6 months ago
SalesforceAIResearch / MCP-Universe
View on GitHub
MCP-Universe is a comprehensive framework designed for RL training, benchmarking, and developing AI agents for general tool-use.
☆592Jun 23, 2026Updated 3 weeks ago
modelscope / sirchmunk
View on GitHub
🐿️ Sirchmunk: Raw data to self-evolving intelligence, real-time.
☆1,181Jun 18, 2026Updated last month
eigent-ai / toolathlon_gym
View on GitHub
Toolathlon-Gym for testing AI agents real-world tool-use capabilities across diverse MCP servers.
☆138Apr 2, 2026Updated 3 months ago
modelscope / awesome-deep-reasoning
View on GitHub
Collect every awesome work about r1!
☆433May 2, 2025Updated last year
synw / modprompt
View on GitHub
Prompt templates for language models
☆10Apr 7, 2026Updated 3 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
modelscope / mcore-bridge
View on GitHub
MCore-Bridge: Providing Megatron-Core model definitions for state-of-the-art large models and making Megatron training as simple as Trans…
☆86Updated this week
ViktorAxelsen / BudgetMem
View on GitHub
[ICML'26] Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory
☆21Jun 10, 2026Updated last month
sierra-research / tau-bench
View on GitHub
Code and Data for Tau-Bench
☆1,337Mar 18, 2026Updated 4 months ago
ryantzr1 / OpenAlita
View on GitHub
Open Source Implementation of Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evo…
☆100Jul 18, 2025Updated last year
Fu-Dayuan / AgentRefine
View on GitHub
(ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning
☆20Nov 22, 2025Updated 7 months ago
XGenerationLab / XiYan-SQL
View on GitHub
A MULTI-GENERATOR ENSEMBLE FRAMEWORK FOR NATURAL LANGUAGE TO SQL
☆1,015May 18, 2026Updated 2 months ago
zjunlp / WorldMind
View on GitHub
Aligning Agentic World Models via Knowledgeable Experience Learning
☆37May 15, 2026Updated 2 months ago
CharlesQ9 / Alita
View on GitHub
☆881Aug 30, 2025Updated 10 months ago
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,138Jun 9, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Victorwz / LaViA
View on GitHub
☆10Jul 13, 2024Updated 2 years ago
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,571Updated this week
Agent-RL / ReCall
View on GitHub
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Rei…
☆1,412May 16, 2025Updated last year
modelscope / ultron
View on GitHub
Ultron: Collective Intelligence System — Shared Memories, Skills, and Harnesses Across Every Agent
☆164Jul 2, 2026Updated 2 weeks ago
WooooDyy / AgentGym
View on GitHub
Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhi…
☆813May 30, 2026Updated last month
ByteDance-Seed / WideSearch
View on GitHub
WideSearch: Benchmarking Agentic Broad Info-Seeking
☆147Oct 9, 2025Updated 9 months ago
RUC-NLPIR / ET-Agent
View on GitHub
☆20Jan 18, 2026Updated 6 months ago