The evaluation benchmark on MCP servers
☆246Sep 3, 2025Updated 8 months ago
Alternatives and similar repositories for MCPBench
Users that are interested in MCPBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Collection of model-centric MCP servers☆26May 21, 2025Updated 11 months ago
- a date understanding and reasoning enhanced model☆52Sep 3, 2025Updated 8 months ago
- A third-party component library based on Gradio. Integrates Ant Design, Ant Design X, Monaco Editor and more advanced components to help…☆142Apr 22, 2026Updated 3 weeks ago
- A Model Context Protocol (MCP) server that enables natural language queries to databases☆235Feb 11, 2026Updated 3 months ago
- ☆64Apr 17, 2026Updated last month
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- An MCP tool that gets things done for you☆13Dec 22, 2024Updated last year
- Aligning Agentic World Models via Knowledgeable Experience Learning☆32Jan 25, 2026Updated 3 months ago
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆60Jul 24, 2025Updated 9 months ago
- MLLM @ Game☆16May 12, 2025Updated last year
- AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management☆30Apr 10, 2026Updated last month
- (ACL 2025) Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation☆12May 21, 2025Updated 11 months ago
- Collect every awesome work about r1!☆432May 2, 2025Updated last year
- ☆17Feb 27, 2025Updated last year
- 🤖 Reddit-infuriating, AI-powered Shell scripts using Claude Code SDK. essentially an ADAS (Automated Design of Agentic Systems) implemen…☆31Sep 17, 2025Updated 8 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Query OpenAI models directly from Claude using MCP protocol.☆81Nov 28, 2024Updated last year
- A MULTI-GENERATOR ENSEMBLE FRAMEWORK FOR NATURAL LANGUAGE TO SQL☆1,000May 5, 2026Updated 2 weeks ago
- A Model Context Protocol server providing LLM Agents a second opinion via AI-powered Deepseek-Reasoning R1 mentorship capabilities, inclu…☆31Jul 22, 2025Updated 9 months ago
- ☆62May 5, 2026Updated 2 weeks ago
- Task management for AI agents☆16Jun 25, 2025Updated 10 months ago
- TypeScript port of the original MCP Agent framework by lastmile-ai☆17Sep 22, 2025Updated 7 months ago
- MS-Agent: a lightweight framework to empower agentic execution of complex tasks☆4,246Apr 15, 2026Updated last month
- Sotopia-RL: Reward Design for Social Intelligence☆50Apr 1, 2026Updated last month
- Code for Estimating Multi-cause Treatment Effects via Single-cause Perturbation (NeurIPS 2021)☆14Jan 5, 2022Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An open-ended eval framework for mcp.run tools☆22May 22, 2025Updated 11 months ago
- Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…☆14,122Updated this week
- A daemon that makes a desktop OS accessible to AI agents☆40May 29, 2025Updated 11 months ago
- A compiler for Tiger language includes lexical analysis using flexc++, parsing using Bisonc++, type checking, building abstract syntax tr…☆13Jan 18, 2023Updated 3 years ago
- PyTorch distributed training acceleration framework☆55Aug 13, 2025Updated 9 months ago
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Rei…☆1,383May 16, 2025Updated last year
- A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.☆2,796Updated this week
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆712Aug 5, 2025Updated 9 months ago
- Trending projects & awesome papers about data-centric llm studies.☆40May 20, 2025Updated 11 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Eval exercises for Roo Code.☆77Jun 9, 2025Updated 11 months ago
- VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking☆88Jan 21, 2026Updated 3 months ago
- ☆21May 10, 2026Updated last week
- O1 Replication Journey☆2,000Jan 14, 2025Updated last year
- A collection of scripts and tools for analyzing SWE agents.☆16May 7, 2025Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆190May 20, 2025Updated 11 months ago
- Code and Data for Tau-Bench☆1,229Mar 18, 2026Updated 2 months ago