SalesforceAIResearch / MCPEvalLinks
MCP-based Agent Deep Evaluation System
☆115Updated 2 weeks ago
Alternatives and similar repositories for MCPEval
Users that are interested in MCPEval are comparing it to the libraries listed below
Sorting:
- The official implementation of the paper "Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models".☆85Updated 5 months ago
- Jina VDR is a multilingual, multi-domain benchmark for visual document retrieval☆28Updated 3 weeks ago
- Official Repo for CRMArena and CRMArena-Pro☆109Updated 2 months ago
- ☆49Updated 11 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 4 months ago
- ScreenSuite - The most comprehensive benchmarking suite for GUI Agents!☆105Updated 3 weeks ago
- Code for the paper "Coding Agents with Multimodal Browsing are Generalist Problem Solvers"☆75Updated 2 weeks ago
- Sakura-SOLAR-DPO: Merge, SFT, and DPO☆116Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆88Updated last week
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆73Updated 8 months ago
- Official Repo for The Paper "Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems"☆56Updated 6 months ago
- Query Expension for Better Query Embedding using LLMs☆56Updated 6 months ago
- ☆98Updated 11 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆64Updated last year
- ☆55Updated 2 months ago
- A lightweight adjustment tool for smoothing token probabilities in the Qwen models to encourage balanced multilingual generation.☆80Updated last month
- Simple examples using Argilla tools to build AI☆54Updated 9 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆65Updated last year
- Repository for “PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers”, NAACL24☆145Updated last year
- LLM reads a paper and produce a working prototype☆57Updated 4 months ago
- A curated list of awesome open-source libraries for context engineering (Long-term memory, MCP: Model Context Protocol, Prompt/RAG Compre…☆84Updated last month
- ☆118Updated last year
- ☆40Updated 8 months ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆108Updated last year
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆100Updated 3 weeks ago
- This repository contains popular code generation frameworks such as MapCoder, CodeSIM.☆57Updated 2 months ago
- Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"☆139Updated 3 weeks ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆114Updated this week
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆77Updated 10 months ago
- Toy O☆16Updated 11 months ago