The evaluation benchmark on MCP servers
☆243Sep 3, 2025Updated 6 months ago
Alternatives and similar repositories for MCPBench
Users that are interested in MCPBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Collection of model-centric MCP servers☆26May 21, 2025Updated 10 months ago
- ☆387Updated this week
- a date understanding and reasoning enhanced model☆52Sep 3, 2025Updated 6 months ago
- XiYanSQL models for Text-to-SQL.☆149Sep 3, 2025Updated 6 months ago
- A Model Context Protocol (MCP) server that enables natural language queries to databases☆232Feb 11, 2026Updated last month
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- An MCP tool that gets things done for you☆13Dec 22, 2024Updated last year
- Prompt templates for language models☆10Feb 28, 2026Updated last month
- Aligning Agentic World Models via Knowledgeable Experience Learning☆32Jan 25, 2026Updated 2 months ago
- MLLM @ Game☆16May 12, 2025Updated 10 months ago
- MCPToolBench++ MCP Model Context Protocol Tool Use Benchmark on AI Agent and Model Tool Use Ability☆41Mar 17, 2026Updated last week
- AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management☆25Mar 17, 2026Updated last week
- Initial commit☆13Aug 14, 2023Updated 2 years ago
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆40Jun 4, 2025Updated 9 months ago
- Collect every awesome work about r1!☆429May 2, 2025Updated 10 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A MULTI-GENERATOR ENSEMBLE FRAMEWORK FOR NATURAL LANGUAGE TO SQL☆982Feb 11, 2026Updated last month
- Query OpenAI models directly from Claude using MCP protocol.☆79Nov 28, 2024Updated last year
- A Model Context Protocol server providing LLM Agents a second opinion via AI-powered Deepseek-Reasoning R1 mentorship capabilities, inclu…☆31Jul 22, 2025Updated 8 months ago
- ☆59Feb 11, 2026Updated last month
- Task management for AI agents☆15Jun 25, 2025Updated 9 months ago
- [ICLR 2026] Official Implementation of "FeatureBench: Benchmarking Agentic Coding for Complex Feature Development"☆37Mar 3, 2026Updated 3 weeks ago
- TypeScript port of the original MCP Agent framework by lastmile-ai☆17Sep 22, 2025Updated 6 months ago
- MS-Agent: a lightweight framework to empower agentic execution of complex tasks☆4,095Updated this week
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆27Feb 17, 2026Updated last month
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Open Image Curation Tools☆48Apr 22, 2025Updated 11 months ago
- Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, …☆13,391Updated this week
- OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models☆29Feb 4, 2026Updated last month
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆154Jul 22, 2025Updated 8 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆264May 14, 2025Updated 10 months ago
- A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.☆2,563Updated this week
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Rei…☆1,352May 16, 2025Updated 10 months ago
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆700Aug 5, 2025Updated 7 months ago
- Trending projects & awesome papers about data-centric llm studies.☆40May 20, 2025Updated 10 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆14Oct 8, 2025Updated 5 months ago
- VeriWeb: Verifiable Long-Chain Web Benchmark for Agentic Information-Seeking☆86Jan 21, 2026Updated 2 months ago
- ☆20Updated this week
- O1 Replication Journey☆2,001Jan 14, 2025Updated last year
- A collection of scripts and tools for analyzing SWE agents.☆16May 7, 2025Updated 10 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆187May 20, 2025Updated 10 months ago
- Code and Data for Tau-Bench☆1,140Mar 18, 2026Updated last week