eval-sys / mcpmarkLinks
MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
☆154Updated this week
Alternatives and similar repositories for mcpmark
Users that are interested in mcpmark are comparing it to the libraries listed below
Sorting:
- Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving☆358Updated last month
- 🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.☆261Updated 7 months ago
- A lightweight script for processing HTML page to markdown format with support for code blocks☆80Updated last year
- A minimalist MVP demonstrating a simple yet profound insight: aligning AI memory with human episodic memory granularity. Shows how this s…☆76Updated 2 weeks ago
- Computer Agent Arena: Test & compare AI agents in real desktop apps & web environments. Code/data coming soon!☆50Updated 5 months ago
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆83Updated 4 months ago
- SkillWeaver is a framework to enable web agent self-improvement through environment exploration and skill synthesis.☆94Updated 5 months ago
- ☆292Updated 3 months ago
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆40Updated 9 months ago
- Repo for "MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability"☆146Updated 3 months ago
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆64Updated 4 months ago
- Prompt-to-Leaderboard☆254Updated 4 months ago
- Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors (ACL Findings 2025)☆83Updated 3 months ago
- Inference code of Lingma SWE-GPT☆240Updated 9 months ago
- Ling is a MoE LLM provided and open-sourced by InclusionAI.☆201Updated 4 months ago
- ☆81Updated 5 months ago
- MiroThinker is open-source agentic models trained for deep research and complex tool use scenarios.☆314Updated this week
- The evaluation benchmark on MCP servers☆208Updated 2 weeks ago
- DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents☆389Updated last month
- ☆108Updated last month
- [ICML 2025] ResearchTown: Simulator of Human Research Community☆174Updated this week
- 珠算代码大模型(Abacus Code LLM)☆56Updated 11 months ago
- ☆103Updated 9 months ago
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution☆84Updated this week
- An open platform for enhancing the capability of LLMs in workflow orchestration.☆171Updated 6 months ago
- ☆89Updated 10 months ago
- Efficient Agent Training for Computer Use☆131Updated 2 weeks ago
- A clean, modular SDK for building AI agents with OpenHands V1.☆22Updated this week
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆99Updated 3 weeks ago
- The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆166Updated 2 months ago