SalesforceAIResearch / MCPEvalLinks
MCP-based Agent Deep Evaluation System
☆129Updated this week
Alternatives and similar repositories for MCPEval
Users that are interested in MCPEval are comparing it to the libraries listed below
Sorting:
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆85Updated 5 months ago
- Official Repo for CRMArena and CRMArena-Pro☆114Updated 2 months ago
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆30Updated last month
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆57Updated 6 months ago
- accompanying material for sleep-time compute paper☆111Updated 4 months ago
- Verifiers for LLM Reinforcement Learning☆72Updated 5 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆95Updated this week
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆75Updated 9 months ago
- ☆50Updated 11 months ago
- A curated list of awesome open-source libraries for context engineering (Long-term memory, MCP: Model Context Protocol, Prompt/RAG Compre…☆90Updated 2 months ago
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!☆115Updated last month
- ☆40Updated 9 months ago
- A method for steering llms to better follow instructions☆50Updated last month
- Code that accompanies the public release of the paper Lost in Conversation (https://arxiv.org/abs/2505.06120)☆160Updated 2 months ago
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆82Updated last week
- Routing on Random Forest (RoRF)☆205Updated 11 months ago
- The first dense retrieval model that can be prompted like an LM☆87Updated 4 months ago
- The code repository of the paper: Competition and Attraction Improve Model Fusion☆150Updated 3 weeks ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 11 months ago
- you.com's framework for evaluating deep research systems.☆32Updated 4 months ago
- An Automatic Prompt Optimization Framework for Large Language Models☆117Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆62Updated 9 months ago
- ☆99Updated last year
- An extended project of the LLM Compiler paper, focusing on developing LLM-based Autonomous Agents.☆26Updated 10 months ago
- ☆76Updated 8 months ago
- ☆48Updated last year
- ☆76Updated 6 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 5 months ago
- Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"☆147Updated last month
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆93Updated 4 months ago