SalesforceAIResearch / MCPEvalLinks
MCP-based Agent Deep Evaluation System
☆142Updated 3 months ago
Alternatives and similar repositories for MCPEval
Users that are interested in MCPEval are comparing it to the libraries listed below
Sorting:
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆86Updated 9 months ago
- A method for steering llms to better follow instructions☆74Updated 5 months ago
- Official Repo for CRMArena and CRMArena-Pro☆129Updated 2 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆259Updated this week
- The code repository of the paper: Competition and Attraction Improve Model Fusion☆169Updated 4 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- ☆94Updated this week
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!☆135Updated 3 months ago
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆38Updated 5 months ago
- Training setup for Langchain's Open Deep Research☆74Updated 4 months ago
- ☆54Updated last week
- ☆106Updated last year
- Dr. Zero Self-Evolving Search Agents without Training Data☆198Updated last week
- The Granite Guardian models are designed to detect risks in prompts and responses.☆127Updated 3 months ago
- A curated list of awesome open-source libraries for context engineering (Long-term memory, MCP: Model Context Protocol, Prompt/RAG Compre…☆103Updated 6 months ago
- ☆80Updated 3 months ago
- Leveraging Base Language Models for Few-Shot Synthetic Data Generation☆40Updated 3 months ago
- ☆87Updated last year
- ☆39Updated last year
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆96Updated 2 months ago
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆45Updated 6 months ago
- accompanying material for sleep-time compute paper☆118Updated 8 months ago
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆60Updated 10 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆102Updated 4 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆114Updated 9 months ago
- Train your own SOTA deductive reasoning model☆107Updated 10 months ago
- MCP-Universe is a comprehensive framework designed for developing, testing, and benchmarking AI agents☆547Updated last week
- Verifiers for LLM Reinforcement Learning☆80Updated 9 months ago
- ☆61Updated 6 months ago
- Scaling Coding-Agent RL to 32x H100s. **Achieving 160% improvement** on Stanford's TerminalBench☆90Updated 2 months ago