SalesforceAIResearch/MCP-Universe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SalesforceAIResearch/MCP-Universe)

SalesforceAIResearch / MCP-Universe

MCP-Universe is a comprehensive framework designed for RL training, benchmarking, and developing AI agents for general tool-use.

☆592

Alternatives and similar repositories for MCP-Universe

Users that are interested in MCP-Universe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Accenture / mcp-bench
View on GitHub
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers
☆496Oct 7, 2025Updated 9 months ago
scaleapi / mcp-atlas
View on GitHub
MCP Atlas
☆130Jul 22, 2026Updated last week
eval-sys / mcpmark
View on GitHub
MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
☆453Jun 12, 2026Updated last month
Coral-Protocol / Anemoi
View on GitHub
Anemoi: A Semi-Centralized Multi-agent Systems Based on Agent-to-Agent Communication MCP server from Coral Protocol
☆370Aug 27, 2025Updated 11 months ago
hkust-nlp / Toolathlon
View on GitHub
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
☆441Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
CodeLLM-Research / CodeJudge-Eval
View on GitHub
[COLING25] CodeJudge Eval: Can Large Language Models be Good Judges in Code Understanding?
☆12Dec 3, 2024Updated last year
SalesforceAIResearch / MCPEval
View on GitHub
MCP-based Agent Deep Evaluation System
☆155Jun 2, 2026Updated last month
TheAgentArk / Toucan
View on GitHub
Official repo of Toucan: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
☆260Dec 16, 2025Updated 7 months ago
Danau5tin / multi-agent-coding-system
View on GitHub
Reached #13 on Stanford's Terminal Bench leaderboard. Orchestrator, explorer & coder agents working together with intelligent context sha…
☆1,414Nov 3, 2025Updated 8 months ago
TencentCloudADP / youtu-agent
View on GitHub
A simple yet powerful agent framework that delivers with open-source models
☆4,583Mar 21, 2026Updated 4 months ago
facebookresearch / meta-agents-research-environments
View on GitHub
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike stat…
☆532Updated this week
eigent-ai / toolathlon_gym
View on GitHub
Toolathlon-Gym for testing AI agents real-world tool-use capabilities across diverse MCP servers.
☆140Jul 22, 2026Updated last week
icip-cas / LiveMCPBench
View on GitHub
LiveMCPBench is a benchmark for evaluating the ability of agents to navigate and utilize a large-scale MCP toolset. It provides a compreh…
☆104Dec 18, 2025Updated 7 months ago
google-deepmind / limit
View on GitHub
On the Theoretical Limitations of Embedding-Based Retrieval
☆651Sep 15, 2025Updated 10 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
sierra-research / tau2-bench
View on GitHub
τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
☆1,685Updated this week
SalesforceAIResearch / UserRL
View on GitHub
The raw UserRL repo under construction
☆114Jun 2, 2026Updated last month
inclusionAI / ASearcher
View on GitHub
An Open-Source Large-Scale Reinforcement Learning Project for Search Agents
☆602Nov 26, 2025Updated 8 months ago
sierra-research / tau-bench
View on GitHub
Code and Data for Tau-Bench
☆1,351Mar 18, 2026Updated 4 months ago
harbor-framework / terminal-bench
View on GitHub
A benchmark for LLMs on complicated tasks in the terminal
☆2,495Jul 11, 2026Updated 2 weeks ago
OPPO-PersonalAI / Agent_Foundation_Models
View on GitHub
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.
☆580Sep 8, 2025Updated 10 months ago
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,740Updated this week
ScriptedAlchemy / devtools-debugger-mcp
View on GitHub
An MCP server exposing full Chrome DevTools Protocol debugging: breakpoints, step/run, call stacks, eval, and source maps.
☆348Oct 2, 2025Updated 9 months ago
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆984Jul 4, 2026Updated 3 weeks ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
GAIR-NLP / LIMI
View on GitHub
LIMI: Less is More for Agency
☆162Oct 14, 2025Updated 9 months ago
webup / langgraph-up-react
View on GitHub
LangGraph template for a simple ReAct agent, with MCP tools support and robust test suites.
☆564Sep 28, 2025Updated 10 months ago
axon-rl / gem
View on GitHub
A Gym for Agentic LLMs
☆503Jan 21, 2026Updated 6 months ago
VectorSpaceLab / general-agentic-memory
View on GitHub
A general memory system for agents, powered by deep-research
☆857Mar 14, 2026Updated 4 months ago
oxbshw / LLM-Agents-Ecosystem-Handbook
View on GitHub
One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation t…
☆538Jun 30, 2026Updated 3 weeks ago
googleapis / gcloud-mcp
View on GitHub
gcloud MCP server
☆870Updated this week
bytedance / USO
View on GitHub
[CVPR 2026] 🔥🔥 Official Repo of USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
☆1,229Sep 12, 2025Updated 10 months ago
mcp-tool-bench / MCPToolBenchPP
View on GitHub
MCPToolBench++ MCP Model Context Protocol Tool Use Benchmark on AI Agent and Model Tool Use Ability
☆44Mar 17, 2026Updated 4 months ago
TIGER-AI-Lab / verl-tool
View on GitHub
A version of verl to support diverse tool use [TMLR 2026]
☆1,026Jul 15, 2026Updated 2 weeks ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ltzheng / SimpleTIR
View on GitHub
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆401Mar 30, 2026Updated 3 months ago
SWE-Gym / SWE-Gym
View on GitHub
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆712Jul 29, 2025Updated last year
evalops / cognitive-dissonance-dspy
View on GitHub
A multi-agent LLM system for detecting and resolving cognitive dissonance.
☆282Apr 25, 2026Updated 3 months ago
run-llama / agentfs-claude
View on GitHub
Run Claude Code/Codex within AgentFS, orchestrated by LlamaIndex Workflows
☆325Dec 19, 2025Updated 7 months ago
Snowflake-Labs / agent-world-model
View on GitHub
Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning
☆418May 28, 2026Updated 2 months ago
Chengsong-Huang / R-Zero
View on GitHub
[ICLR2026] codes for R-Zero: Self-Evolving Reasoning LLM from Zero Data (https://www.arxiv.org/pdf/2508.05004)
☆825Feb 4, 2026Updated 5 months ago
harbor-framework / harbor
View on GitHub
Framework for evaluating and improving agents
☆3,611Updated this week