UW-Madison-Lee-Lab / LLM-judge-reporting
View external linksLinks

A simple plug-in framework that corrects bias and computes confidence intervals in reporting LLM-as-a-judge evaluation, and an adaptive algorithm that efficiently allocates calibration samples to reduce uncertainty in estimates.
69Nov 27, 2025Updated 2 months ago

Alternatives and similar repositories for LLM-judge-reporting

Users that are interested in LLM-judge-reporting are comparing it to the libraries listed below

Sorting:

Are these results useful?