SalesforceAIResearch / MCPEvalLinks
MCP-based Agent Deep Evaluation System
☆144Updated 4 months ago
Alternatives and similar repositories for MCPEval
Users that are interested in MCPEval are comparing it to the libraries listed below
Sorting:
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆87Updated 10 months ago
- A method for steering llms to better follow instructions☆78Updated 6 months ago
- Official Repo for CRMArena and CRMArena-Pro☆132Updated this week
- ☆106Updated last year
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆38Updated 6 months ago
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆97Updated 3 months ago
- ☆54Updated 3 weeks ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆261Updated this week
- ☆39Updated last year
- accompanying material for sleep-time compute paper☆119Updated 9 months ago
- Data recipes and robust infrastructure for training AI agents☆94Updated this week
- ☆78Updated 4 months ago
- ☆61Updated 7 months ago
- ☆80Updated 4 months ago
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆60Updated 11 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- you.com's framework for evaluating deep research systems.☆67Updated 8 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆65Updated last year
- Nexusflow function call, tool use, and agent benchmarks.☆30Updated last year
- Training Proactive and Personalized LLM Agents☆98Updated 3 weeks ago
- ☆87Updated last year
- The code repository of the paper: Competition and Attraction Improve Model Fusion☆169Updated 5 months ago
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!☆137Updated 4 months ago
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆47Updated 6 months ago
- Training setup for Langchain's Open Deep Research☆75Updated 5 months ago
- ☆43Updated 3 months ago
- MCP-Universe is a comprehensive framework designed for developing, testing, and benchmarking AI agents☆556Updated this week
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆102Updated 5 months ago
- Submodular optimization for context engineering: query fan-out, text selection, passage reranking☆78Updated 6 months ago
- ☆50Updated last year