meituan-longcat/vitabench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/meituan-longcat/vitabench)

meituan-longcat / vitabench

[ICLR 2026] VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

☆157

Alternatives and similar repositories for vitabench

Users that are interested in vitabench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sierra-research / tau2-bench
View on GitHub
τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
☆1,622Updated this week
meituan-longcat / R-HORIZON
View on GitHub
[ICLR'26] R-HORIZON: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
☆27May 9, 2026Updated 2 months ago
RUC-NLPIR / EnvScaler
View on GitHub
The official implementation of "EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis".
☆175Feb 12, 2026Updated 5 months ago
hkust-nlp / Toolathlon
View on GitHub
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
☆430Updated this week
chenchen0103 / ACEBench
View on GitHub
☆187Oct 29, 2025Updated 8 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
meituan / vitabench
View on GitHub
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
☆23Oct 17, 2025Updated 9 months ago
meituan-longcat / LongCat-Flash-Thinking
View on GitHub
☆287May 13, 2026Updated 2 months ago
sierra-research / tau-bench
View on GitHub
Code and Data for Tau-Bench
☆1,337Mar 18, 2026Updated 4 months ago
eigent-ai / toolathlon_gym
View on GitHub
Toolathlon-Gym for testing AI agents real-world tool-use capabilities across diverse MCP servers.
☆138Apr 2, 2026Updated 3 months ago
scaleapi / mcp-atlas
View on GitHub
MCP Atlas
☆120Updated this week
lukahhcm / Awesome_Environment_Scaling
View on GitHub
Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …
☆71Jan 28, 2026Updated 5 months ago
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,138Jun 9, 2026Updated last month
meituan-longcat / LongCat-Flash-Chat
View on GitHub
☆1,352Jun 23, 2026Updated 3 weeks ago
hkust-nlp / WebExplorer
View on GitHub
The official repo of "WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents"
☆120Sep 29, 2025Updated 9 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
plageon / HierSearch
View on GitHub
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
☆40Oct 9, 2025Updated 9 months ago
syr-cn / MemOCR
View on GitHub
☆16Mar 9, 2026Updated 4 months ago
RUC-NLPIR / SmartSearch
View on GitHub
☆45Jan 19, 2026Updated 6 months ago
RUCKBReasoning / CodeRM
View on GitHub
Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'
☆27May 16, 2025Updated last year
TheAgentArk / Toucan
View on GitHub
Official repo of Toucan: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments
☆256Dec 16, 2025Updated 7 months ago
AkaliKong / PaperClaw
View on GitHub
☆22Mar 11, 2026Updated 4 months ago
MM-Thinking / Metis-RISE
View on GitHub
Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning
☆22Jun 26, 2025Updated last year
AGI-Eval-Official / CoreCodeBench
View on GitHub
☆16Nov 20, 2025Updated 8 months ago
ltzheng / SimpleTIR
View on GitHub
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆401Mar 30, 2026Updated 3 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
MANGA-UOFA / TokMem
View on GitHub
☆25Mar 14, 2026Updated 4 months ago
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,708Updated this week
DJC-GO-SOLO / Latent-SFT
View on GitHub
Official implementation of Latent-SFT: teaching LLMs to reason with vocabulary-space latent chains.
☆55May 18, 2026Updated 2 months ago
ByteDance-Seed / WideSearch
View on GitHub
WideSearch: Benchmarking Agentic Broad Info-Seeking
☆147Oct 9, 2025Updated 9 months ago
1229095296 / ResRL
View on GitHub
This repository includes code for our paper: ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning…
☆15May 2, 2026Updated 2 months ago
StarDewXXX / AdaR1
View on GitHub
The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"
☆24May 6, 2026Updated 2 months ago
homles11 / SaLoRA
View on GitHub
Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”
☆29Oct 23, 2025Updated 8 months ago
uservan / ThinkPO
View on GitHub
☆17Aug 1, 2025Updated 11 months ago
ZJU-REAL / InftyThink-Plus
View on GitHub
[ICML 2026] InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning
☆33May 25, 2026Updated last month
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
meituan-longcat / LongCat-Audio-Codec
View on GitHub
LongCat Audio Tokenizer and Detokenizer
☆301May 9, 2026Updated 2 months ago
MiroMindAI / MiroEval
View on GitHub
MiroEval: A benchmark and evaluation framework for deep research agents — 100 tasks (70 text, 30 multimodal) assessed across synthesis qu…
☆46Jul 6, 2026Updated 2 weeks ago
HJYao00 / R1-ShareVL
View on GitHub
[NeurIPS 2025] Reasoning MLLM, Share-GRPO, advantage vanishing, sparse reward
☆38Sep 19, 2025Updated 10 months ago
stallone0000 / Reasoning-Skill
View on GitHub
☆20May 25, 2026Updated last month
AGI-Eval-Official / amemgym
View on GitHub
☆40Apr 7, 2026Updated 3 months ago
SalesforceAIResearch / MCP-Universe
View on GitHub
MCP-Universe is a comprehensive framework designed for RL training, benchmarking, and developing AI agents for general tool-use.
☆592Jun 23, 2026Updated 3 weeks ago
lgy0404 / MemGUI-Bench
View on GitHub
[ACM MM 2026] MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments
☆46Jul 13, 2026Updated last week