UW-Madison-Lee-Lab/LLM-judge-reporting

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/UW-Madison-Lee-Lab/LLM-judge-reporting)

UW-Madison-Lee-Lab / LLM-judge-reporting

A simple plug-in framework that corrects bias and computes confidence intervals in reporting LLM-as-a-judge evaluation, and an adaptive algorithm that efficiently allocates calibration samples to reduce uncertainty in estimates.

☆79

Alternatives and similar repositories for LLM-judge-reporting

Users that are interested in LLM-judge-reporting are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

apssouza22 / ai-agent-react-llm
View on GitHub
A vanilla implementation of ReAct: Synergizing Reasoning and Acting in Language Models
☆17Mar 26, 2025Updated last year
endomorphosis / swissknife
View on GitHub
AI powered Virtual Desktop
☆16Updated this week
abrvkh / explainability_toolkit
View on GitHub
☆14Dec 12, 2024Updated last year
susumuota / nano-askllm
View on GitHub
Unofficial implementation of the Ask-LLM paper 'How to Train Data-Efficient LLMs', arXiv:2402.09668.
☆12Jun 19, 2024Updated 2 years ago
yosefdayani / MV-RAG
View on GitHub
MV-RAG combines retrieval with multi-view generation to create accurate 3D-consistent visuals. By retrieving reference images and text, i…
☆23Nov 29, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
MaheepChaudhary / SAE-Ravel
View on GitHub
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆13Jan 26, 2025Updated last year
harbor-framework / harbor-cookbook
View on GitHub
Realistic examples of building evals and optimizing agents with Harbor
☆118Apr 23, 2026Updated 2 months ago
Lossfunk / KernelBench-v2
View on GitHub
KernelBench v2: Can LLMs Write GPU Kernels? - Benchmark with Torch -> Triton (and more!) problems
☆24Jul 4, 2025Updated last year
wjko2 / INQUISITIVE
View on GitHub
☆17Mar 15, 2023Updated 3 years ago
Columbia-NLP-Lab / LionAlignment
View on GitHub
☆12Aug 6, 2024Updated last year
havenpersona / lycon
View on GitHub
Copyright-free Artificial Lyrics Dataset (ISMIR 2024 LBD)
☆12Sep 1, 2024Updated last year
Virtual-Protocol / acp-python
View on GitHub
☆26Mar 31, 2026Updated 3 months ago
wizard-III / Archer2.0
View on GitHub
Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…
☆31Oct 10, 2025Updated 8 months ago
MraDonkey / rethinking_prompting
View on GitHub
[ACL 2025 Main] (🏆 Outstanding Paper Award) Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Proba…
☆18Aug 15, 2025Updated 10 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
giuseppe-zappia / complex-reasoning-with-react-and-langchain
View on GitHub
Complex Reasoning with ReAct and LangChain
☆12Apr 24, 2024Updated 2 years ago
WangWenhao0716 / PDF-Embedding
View on GitHub
[NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"
☆18Oct 1, 2024Updated last year
Alex-Mathai-98 / kGym-Kernel-Playground
View on GitHub
Kernel Playground - A playground to run large scale experiments on the Linux Kernel
☆22Nov 8, 2025Updated 7 months ago
matthelmer / DSPy-examples
View on GitHub
Example code using the DSPy framework.
☆20May 30, 2024Updated 2 years ago
SecureAIAutonomyLab / MA-ToT
View on GitHub
☆13Oct 31, 2024Updated last year
Sunwood-ai-labs / PEGASUS
View on GitHub
Evolutionary Merge Experiment
☆49Jun 10, 2024Updated 2 years ago
puppetm4st3r / local_function_calling
View on GitHub
This repository contains a Python implementation that allows you to use gorilla-llm/gorilla-openfunctions-v2 LLM to perform function call…
☆17Apr 7, 2024Updated 2 years ago
NUS-IDS / eacl23_soqg
View on GitHub
☆15Mar 4, 2026Updated 4 months ago
davidheineman / thresh
View on GitHub
🌾 Universal, customizable and deployable fine-grained evaluation for text generation.
☆24Apr 22, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Ascend / TransferQueue
View on GitHub
An asynchronous streaming data management module for efficient post-training.
☆103Jun 29, 2026Updated last week
xiaofengShi / SPAR
View on GitHub
☆25Jul 23, 2025Updated 11 months ago
mathlex / mathlex
View on GitHub
MathLex JavaScript math entry system
☆21Apr 29, 2025Updated last year
cgoinglove / ts-edge
View on GitHub
A lightweight, type-safe workflow engine for TypeScript that helps you create flexible, graph-based execution flows
☆28Jun 24, 2025Updated last year
ReCAP-Stanford / ReCAP
View on GitHub
ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents, NeurIPS 2025
☆38Nov 15, 2025Updated 7 months ago
richardblythman / awesome-multi-agent-systems
View on GitHub
A curated list of awesome resources, libraries, frameworks, and tools for multi-agent systems (MAS) research and development.
☆33Feb 17, 2025Updated last year
tilde-research / sieve
View on GitHub
Applying SAEs for fine-grained control
☆27Dec 15, 2024Updated last year
neodyland / entropix
View on GitHub
Unofficial entropix impl for Gemma2 and Llama and Qwen2 and Mistral
☆17Jan 12, 2025Updated last year
yinzhangyue / EoT
View on GitHub
Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication
☆21Mar 21, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
agent0lab / agent0-ts
View on GitHub
TypeScript SDK for agent portability, discovery and trust based on ERC-8004.
☆67Mar 16, 2026Updated 3 months ago
Tufalabs / TextbooksToRL
View on GitHub
☆29Aug 27, 2025Updated 10 months ago
ReedGraff / schemic
View on GitHub
Actually Working OpenAI Structured Output
☆19Apr 29, 2025Updated last year
AgentBudget / agentbudget
View on GitHub
AgentBudget is the ulimit for AI agents. Just like Unix systems have ulimit to prevent a single process from consuming all system resourc…
☆105May 30, 2026Updated last month
gu-fan / clickable.vim
View on GitHub
Make things clickable
☆39Apr 6, 2016Updated 10 years ago
devpytech / gtk-gresource
View on GitHub
Change your login theme to your gtk theme
☆12Jan 1, 2018Updated 8 years ago
shulltronics / iron-coder
View on GitHub
An embedded Rust IDE with an emphasis on a fun and insightful coding experience
☆11Sep 23, 2024Updated last year