mrconter1 / BenchmarkAggregatorView on GitHub
Comprehensive LLM evaluation framework: GPQA Diamond to Chatbot Arena. Tests all major models equally, easily extensible.
17Aug 22, 2024Updated last year

Alternatives and similar repositories for BenchmarkAggregator

Users that are interested in BenchmarkAggregator are comparing it to the libraries listed below

Sorting:

Are these results useful?