philschmid / ai-agent-benchmark-compendiumView on GitHub
Compendium of over 50 benchmarks for evaluating AI agents, categorized into Function Calling & Tool Use, General Assistant & Reasoning, Coding & Software Engineering, and Computer Interaction.
114Oct 15, 2025Updated 5 months ago

Alternatives and similar repositories for ai-agent-benchmark-compendium

Users that are interested in ai-agent-benchmark-compendium are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?