SalesforceAIResearch / MCPEvalLinks
MCP-based Agent Deep Evaluation System
☆82Updated 2 weeks ago
Alternatives and similar repositories for MCPEval
Users that are interested in MCPEval are comparing it to the libraries listed below
Sorting:
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆82Updated 4 months ago
- Official Repo for CRMArena and CRMArena-Pro☆104Updated last month
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆22Updated this week
- ☆47Updated 10 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆81Updated last week
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!☆99Updated last week
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆56Updated 5 months ago
- ☆54Updated last month
- Submodular optimization for context engineering: query fan-out, text selection, passage reranking☆67Updated 3 weeks ago
- Verifiers for LLM Reinforcement Learning☆68Updated 3 months ago
- A curated list of awesome open-source libraries for context engineering (Long-term memory, MCP: Model Context Protocol, Prompt/RAG Compre…☆79Updated last month
- An extended project of the LLM Compiler paper, focusing on developing LLM-based Autonomous Agents.☆25Updated 9 months ago
- ☆96Updated 10 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated 11 months ago
- ☆40Updated 7 months ago
- Toy O☆16Updated 10 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆61Updated 3 weeks ago
- ☆78Updated 9 months ago
- LLM reads a paper and produce a working prototype☆58Updated 3 months ago
- Query Expension for Better Query Embedding using LLMs☆55Updated 5 months ago
- Build complex LLM Applications with Python Dictionary☆40Updated 9 months ago
- A framework for high-fidelity retrieval augmented generation in industrial knowledge bases. Integrates jargon identification, context rec…☆33Updated 11 months ago
- ☆44Updated 3 months ago
- Collection of scripts and notebooks for OpenAI's latest GPT OSS models☆222Updated this week
- Source code for the collaborative reasoner research project at Meta FAIR.☆99Updated 3 months ago
- 🔎 A deep-dive into HyDE for Advanced LLM RAG + 💡 Introducing AutoHyDE, a semi-supervised framework to improve the effectiveness, covera…☆32Updated last year
- ☆13Updated last week
- ☆46Updated 2 months ago
- ☆62Updated last month
- Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"☆130Updated this week