SalesforceAIResearch / MCPEvalLinks
MCP-based Agent Deep Evaluation System
☆144Updated 4 months ago
Alternatives and similar repositories for MCPEval
Users that are interested in MCPEval are comparing it to the libraries listed below
Sorting:
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆87Updated 10 months ago
- Official Repo for CRMArena and CRMArena-Pro☆132Updated this week
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆261Updated last week
- ☆106Updated last year
- ☆87Updated last year
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆38Updated 6 months ago
- Training Proactive and Personalized LLM Agents☆98Updated 3 weeks ago
- Data recipes and robust infrastructure for training AI agents☆94Updated this week
- A curated list of awesome open-source libraries for context engineering (Long-term memory, MCP: Model Context Protocol, Prompt/RAG Compre…☆104Updated 7 months ago
- A method for steering llms to better follow instructions☆78Updated 6 months ago
- Scaling Coding-Agent RL to 32x H100s. **Achieving 160% improvement** on Stanford's TerminalBench☆92Updated 3 months ago
- Verifiers for LLM Reinforcement Learning☆81Updated 4 months ago
- ☆39Updated last year
- ☆177Updated 11 months ago
- The code repository of the paper: Competition and Attraction Improve Model Fusion☆169Updated 5 months ago
- ☆80Updated 4 months ago
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!☆135Updated 4 months ago
- ☆238Updated 2 months ago
- Training setup for Langchain's Open Deep Research☆75Updated 5 months ago
- Simple examples using Argilla tools to build AI☆57Updated last year
- GPT-4 Level Conversational QA Trained In a Few Hours☆65Updated last year
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆97Updated 3 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- ☆54Updated 3 weeks ago
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆60Updated 11 months ago
- Streamline on-policy/off-policy distillation workflows in a few lines of code☆95Updated this week
- [NAACL2025] LiteWebAgent: The Open-Source Suite for VLM-Based Web-Agent Applications☆143Updated 7 months ago
- Verifiers for LLM Reinforcement Learning☆80Updated 9 months ago
- ☆61Updated 7 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆112Updated 9 months ago