The evaluation benchmark on MCP servers
☆247Sep 3, 2025Updated 9 months ago
Alternatives and similar repositories for MCPBench
Users that are interested in MCPBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Collection of model-centric MCP servers☆26May 21, 2025Updated last year
- a date understanding and reasoning enhanced model☆53Sep 3, 2025Updated 9 months ago
- XiYanSQL models for Text-to-SQL.☆153Sep 3, 2025Updated 9 months ago
- A Model Context Protocol (MCP) server that enables natural language queries to databases☆237Feb 11, 2026Updated 3 months ago
- ☆72Apr 17, 2026Updated last month
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- An MCP tool that gets things done for you☆13Dec 22, 2024Updated last year
- a web logging proxy for MCP client-server communication☆30May 29, 2026Updated last week
- Prompt templates for language models☆10Apr 7, 2026Updated 2 months ago
- Aligning Agentic World Models via Knowledgeable Experience Learning☆35May 15, 2026Updated 3 weeks ago
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆60Jul 24, 2025Updated 10 months ago
- Initial commit☆13Aug 14, 2023Updated 2 years ago
- (ACL 2025) Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation☆12May 21, 2025Updated last year
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆40Jun 4, 2025Updated last year
- Collect every awesome work about r1!☆432May 2, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆17Feb 27, 2025Updated last year
- A MULTI-GENERATOR ENSEMBLE FRAMEWORK FOR NATURAL LANGUAGE TO SQL☆1,005May 18, 2026Updated 3 weeks ago
- A Model Context Protocol server providing LLM Agents a second opinion via AI-powered Deepseek-Reasoning R1 mentorship capabilities, inclu…☆31Jul 22, 2025Updated 10 months ago
- ☆64May 5, 2026Updated last month
- 🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource…☆437Feb 17, 2026Updated 3 months ago
- TypeScript port of the original MCP Agent framework by lastmile-ai☆17Sep 22, 2025Updated 8 months ago
- MS-Agent: a lightweight framework to empower agentic execution of complex tasks☆4,286Apr 15, 2026Updated last month
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆29Feb 17, 2026Updated 3 months ago
- Sotopia-RL: Reward Design for Social Intelligence☆50Apr 1, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code for Estimating Multi-cause Treatment Effects via Single-cause Perturbation (NeurIPS 2021)☆14Jan 5, 2022Updated 4 years ago
- Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…☆14,424Updated this week
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆161Jul 22, 2025Updated 10 months ago
- A daemon that makes a desktop OS accessible to AI agents☆41May 29, 2025Updated last year
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆262May 14, 2025Updated last year
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Rei…☆1,387May 16, 2025Updated last year
- A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.☆2,902Updated this week
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆716Aug 5, 2025Updated 10 months ago
- Trending projects & awesome papers about data-centric llm studies.☆40May 20, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Eval exercises for Roo Code.☆78Jun 9, 2025Updated 11 months ago
- VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking☆89Jan 21, 2026Updated 4 months ago
- A collection of scripts and tools for analyzing SWE agents.☆16May 7, 2025Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆190May 20, 2025Updated last year
- Provides summarised output from various actions that could otherwise eat up tokens and cause crashes for AI agents☆37Jun 15, 2025Updated 11 months ago
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆601Aug 10, 2025Updated 9 months ago
- Code and Data for Tau-Bench☆1,255Mar 18, 2026Updated 2 months ago