The evaluation benchmark on MCP servers
☆247Sep 3, 2025Updated 9 months ago
Alternatives and similar repositories for MCPBench
Users that are interested in MCPBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Collection of model-centric MCP servers☆26May 21, 2025Updated last year
- ☆419Jun 12, 2026Updated 2 weeks ago
- XiYanSQL models for Text-to-SQL.☆154Sep 3, 2025Updated 9 months ago
- A Model Context Protocol (MCP) server that enables natural language queries to databases☆236Feb 11, 2026Updated 4 months ago
- ☆79Jun 19, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An MCP tool that gets things done for you☆13Dec 22, 2024Updated last year
- a web logging proxy for MCP client-server communication☆30May 29, 2026Updated 3 weeks ago
- Prompt templates for language models☆10Apr 7, 2026Updated 2 months ago
- Aligning Agentic World Models via Knowledgeable Experience Learning☆36May 15, 2026Updated last month
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆60Jul 24, 2025Updated 11 months ago
- MLLM @ Game☆17May 12, 2025Updated last year
- AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management☆31Apr 10, 2026Updated 2 months ago
- Initial commit☆13Aug 14, 2023Updated 2 years ago
- (ACL 2025) Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation☆12May 21, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Collect every awesome work about r1!☆432May 2, 2025Updated last year
- 🤖 Reddit-infuriating, AI-powered Shell scripts using Claude Code SDK. essentially an ADAS (Automated Design of Agentic Systems) implemen…☆32Jun 8, 2026Updated 2 weeks ago
- A MULTI-GENERATOR ENSEMBLE FRAMEWORK FOR NATURAL LANGUAGE TO SQL☆1,013May 18, 2026Updated last month
- 一个简单的脚本生成器,用于以最佳启动参数启动您的 Minecraft 服务器。☆11Apr 6, 2023Updated 3 years ago
- A Model Context Protocol server providing LLM Agents a second opinion via AI-powered Deepseek-Reasoning R1 mentorship capabilities, inclu…☆30Jul 22, 2025Updated 11 months ago
- ☆65May 5, 2026Updated last month
- 🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource…☆444Feb 17, 2026Updated 4 months ago
- MS-Agent: a lightweight framework to empower agentic execution of complex tasks☆4,320Updated this week
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆29Feb 17, 2026Updated 4 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Sotopia-RL: Reward Design for Social Intelligence☆52Apr 1, 2026Updated 2 months ago
- Code for Estimating Multi-cause Treatment Effects via Single-cause Perturbation (NeurIPS 2021)☆14Jan 5, 2022Updated 4 years ago
- Open Image Curation Tools☆47Apr 22, 2025Updated last year
- Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…☆14,633Updated this week
- PyTorch distributed training acceleration framework☆56Aug 13, 2025Updated 10 months ago
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Rei…☆1,399May 16, 2025Updated last year
- A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.☆2,992Updated this week
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆719Aug 5, 2025Updated 10 months ago
- ☆13Oct 8, 2025Updated 8 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Trending projects & awesome papers about data-centric llm studies.☆40May 20, 2025Updated last year
- VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking☆88Jan 21, 2026Updated 5 months ago
- ☆23Jun 12, 2026Updated 2 weeks ago
- A collection of scripts and tools for analyzing SWE agents.☆16May 7, 2025Updated last year
- O1 Replication Journey☆2,001Jan 14, 2025Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆191May 20, 2025Updated last year
- Provides summarised output from various actions that could otherwise eat up tokens and cause crashes for AI agents☆36Jun 15, 2025Updated last year