eval-sys / mcpmarkLinks
MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
☆357Updated last week
Alternatives and similar repositories for mcpmark
Users that are interested in mcpmark are comparing it to the libraries listed below
Sorting:
- Official repo of Toucan: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments☆194Updated 3 weeks ago
- Deep Research☆303Updated 4 months ago
- MrlX: A Multi-Agent Reinforcement Learning Framework☆160Updated last month
- The evaluation benchmark on MCP servers☆234Updated 4 months ago
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆110Updated 3 months ago
- Official resources of "The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reaso…☆15Updated 6 months ago
- SkillsBench evaluates how well skills work and how effective agents are at using them☆25Updated this week
- All-in-one Web Agent framework for post-training. Start building with a few clicks!☆275Updated 6 months ago
- Implementation for OAgents: An Empirical Study of Building Effective Agents☆299Updated 2 months ago
- Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving☆410Updated 4 months ago
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents☆531Updated last month
- Omni Model Benchmark with high quality and diversity, which reveals the Compositional Law. We’re now focused on Chinese scenarios — and a…☆76Updated this week
- [NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!☆153Updated last week
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL☆224Updated 3 months ago
- ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization☆95Updated 7 months ago
- [EMNLP 2025] RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions☆137Updated 8 months ago
- Data Synthesis for Deep Research Based on Semi-Structured Data☆191Updated 3 weeks ago
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆100Updated 3 months ago
- Prompt-to-Leaderboard☆270Updated 8 months ago
- ☆130Updated 8 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆254Updated 8 months ago
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆67Updated 7 months ago
- ☆176Updated 2 months ago
- An open platform for enhancing the capability of LLMs in workflow orchestration.☆182Updated 9 months ago
- ☆52Updated 3 months ago
- ☆360Updated 6 months ago
- ☆162Updated 3 weeks ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆46Updated 4 months ago
- Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.☆520Updated 4 months ago
- A minimalist MVP demonstrating a simple yet profound insight: aligning AI memory with human episodic memory granularity. Shows how this s…☆146Updated 2 weeks ago