openclaw / shellbenchView on GitHub
The agent benchmark that scores the full stack — harness, config, and model — not just the LLM. Trace-based scoring, reliability metrics, configuration diagnostics.
124Jun 24, 2026Updated last week

Alternatives and similar repositories for shellbench

Users that are interested in shellbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?