THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
2,229Updated last week

Related projects

Alternatives and complementary repositories for AgentBench