prometheus-eval/scaling-evaluation-compute

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/prometheus-eval/scaling-evaluation-compute)

prometheus-eval / scaling-evaluation-compute

Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"

☆12

Alternatives and similar repositories for scaling-evaluation-compute

Users that are interested in scaling-evaluation-compute are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

neulab / data-agora
View on GitHub
[ACL 2025 Main] Official Repository for "Evaluating Language Models as Synthetic Data Generators"
☆40Dec 13, 2024Updated last year
MattYoon / reasoning-models-confidence
View on GitHub
[NeurIPS 2025] Reasoning Models Better Express Their Confidence"
☆23Nov 19, 2025Updated 8 months ago
guijinSON / MM-Eval
View on GitHub
Official implementation for "MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models"
☆20Oct 26, 2024Updated last year
passing2961 / DialogCC
View on GitHub
Official code and dataset for our NAACL 2024 paper: DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datase…
☆13Jun 24, 2024Updated 2 years ago
ai4reason / ATP_Proofs
View on GitHub
Interesting ATP Proofs
☆13Sep 3, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
zhangir-azerbayev / repl
View on GitHub
A simple REPL for Lean 4, returning information about errors and sorries.
☆12Jun 19, 2023Updated 3 years ago
naver-ai / ALMoST
View on GitHub
☆24Dec 2, 2023Updated 2 years ago
EleutherAI / hae-rae
View on GitHub
☆33Aug 30, 2023Updated 2 years ago
kaistAI / Janus
View on GitHub
[NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages
☆53Aug 10, 2025Updated 11 months ago
passing2961 / Stark
View on GitHub
Official code and dataset for our EMNLP 2024 Findings paper: Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Kn…
☆19Dec 27, 2024Updated last year
jkc-ai / mwp_kr_data
View on GitHub
☆13Jan 12, 2023Updated 3 years ago
cmu-l3 / neurips2024-inference-tutorial-code
View on GitHub
NeurIPS 2024 tutorial on LLM Inference
☆50Dec 10, 2024Updated last year
HAE-RAE / haerae-evaluation-toolkit
View on GitHub
The most modern LLM evaluation toolkit
☆70Apr 30, 2026Updated 2 months ago
davidkim205 / kollm_evaluation
View on GitHub
자체 구축한 한국어 평가 데이터셋을 이용한 한국어 모델 평가
☆31May 31, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Xinyi2016 / FInstruct
View on GitHub
☆14May 8, 2023Updated 3 years ago
metterian / korean_bert_score
View on GitHub
BERT score for text generation
☆12Jan 15, 2025Updated last year
The-FinAI / The-FinData
View on GitHub
the benchmark for finance
☆11Jul 4, 2023Updated 3 years ago
kaistAI / LangBridge
View on GitHub
[ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision
☆97Oct 30, 2024Updated last year
Itaymanes / K-QA
View on GitHub
Dataset and Evaluation Code for the K-QA Benchmark.
☆18May 26, 2024Updated 2 years ago
prometheus-eval / cmu-paper-reviewer
View on GitHub
Code repository for the "CMU Paper Reviewer System", a agentic system that generates reviews for academic papers.
☆25Jun 9, 2026Updated last month
jesse-michael-han / lean-tpe-public
View on GitHub
The Lean Theorem Proving Environment
☆15May 7, 2023Updated 3 years ago
joeljang / FLM
View on GitHub
All-in-one repository for Fine-tuning & Pretraining (Large) Language Models
☆15Mar 8, 2023Updated 3 years ago
r-three / realistic_evaluation_of_model_merging_for_compositional_generalization
View on GitHub
☆12Feb 11, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
YeonwooSung / MLOps
View on GitHub
Miscellaneous codes and writings for MLOps
☆16Apr 8, 2026Updated 3 months ago
ProgrammingWithPixels / PwP
View on GitHub
☆23Feb 27, 2025Updated last year
jkc-ai / mwp-korean-data-2021
View on GitHub
자연어 처리 기반 [한글 서술형 수학문제 데이터셋] 공개 저장소입니다.
☆14Jun 12, 2023Updated 3 years ago
sunlab-osu / CliniQG4QA
View on GitHub
CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering
☆23Feb 26, 2021Updated 5 years ago
AndreHe02 / rewarding-unlikely-release
View on GitHub
☆15Jun 10, 2025Updated last year
google / formal-ml
View on GitHub
☆25Apr 21, 2021Updated 5 years ago
overfit-brothers / KRX-2024
View on GitHub
☆12Dec 20, 2024Updated last year
songys / huggingface_KoreanDataset
View on GitHub
huggingface에 있는 한국어 데이터 세트
☆37Oct 10, 2024Updated last year
naver-ai / KoNET
View on GitHub
Evaluating Multimodal Generative AI with Korean Educational Standards, NAACL 2025.
☆27May 15, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
kimyuji / EvolvingQA_benchmark
View on GitHub
Code and Dataset release of "Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models" (NAACL 2024)
☆10Oct 16, 2024Updated last year
gauss5930 / iDUS
View on GitHub
An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.
☆14Mar 20, 2024Updated 2 years ago
teddysum / korean_evaluation
View on GitHub
☆10Jun 5, 2025Updated last year
NoSyu / VHUCM
View on GitHub
Implementation of Variational Hierarchical User-based Conversation Model
☆10Jul 2, 2021Updated 5 years ago
csitfun / LogiCoT
View on GitHub
the instructions and demonstrations for building a formal logical reasoning capable GLM
☆54Sep 3, 2024Updated last year
wade3han / normlens
View on GitHub
An official codebase for "NormLens: Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Comm…
☆10May 9, 2024Updated 2 years ago
BartoszPiotrowski / lean-premise-selection
View on GitHub
☆22Jan 14, 2026Updated 6 months ago