UW-Madison-Lee-Lab / LLM-judge-reportingView on GitHub
A simple plug-in framework that corrects bias and computes confidence intervals in reporting LLM-as-a-judge evaluation, and an adaptive algorithm that efficiently allocates calibration samples to reduce uncertainty in estimates.
73Nov 27, 2025Updated 4 months ago

Alternatives and similar repositories for LLM-judge-reporting

Users that are interested in LLM-judge-reporting are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?