Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"
☆12Mar 25, 2025Updated 11 months ago
Alternatives and similar repositories for scaling-evaluation-compute
Users that are interested in scaling-evaluation-compute are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ACL 2025 Main] Official Repository for "Evaluating Language Models as Synthetic Data Generators"☆41Dec 13, 2024Updated last year
- Official implementation for "MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models"☆18Oct 26, 2024Updated last year
- [NeurIPS 2025] Reasoning Models Better Express Their Confidence"☆22Nov 19, 2025Updated 4 months ago
- Official code and dataset for our NAACL 2024 paper: DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datase…☆13Jun 24, 2024Updated last year
- Interesting ATP Proofs☆13Sep 3, 2021Updated 4 years ago
- ☆24Dec 2, 2023Updated 2 years ago
- A simple REPL for Lean 4, returning information about errors and sorries.☆12Jun 19, 2023Updated 2 years ago
- ☆33Aug 30, 2023Updated 2 years ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆53Aug 10, 2025Updated 7 months ago
- Official code and dataset for our EMNLP 2024 Findings paper: Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Kn…☆19Dec 27, 2024Updated last year
- NeurIPS 2024 tutorial on LLM Inference☆49Dec 10, 2024Updated last year
- Evaluating Multimodal Generative AI with Korean Educational Standards, NAACL 2025.☆26May 15, 2025Updated 10 months ago
- BERT score for text generation☆12Jan 15, 2025Updated last year
- ☆14May 8, 2023Updated 2 years ago
- 자체 구축한 한국어 평가 데이터셋을 이용한 한국어 모델 평가☆31May 31, 2024Updated last year
- ☆13Jan 12, 2023Updated 3 years ago
- The most modern LLM evaluation toolkit☆70Nov 9, 2025Updated 4 months ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆96Oct 30, 2024Updated last year
- the benchmark for finance☆10Jul 4, 2023Updated 2 years ago
- The Lean Theorem Proving Environment☆15May 7, 2023Updated 2 years ago
- All-in-one repository for Fine-tuning & Pretraining (Large) Language Models☆15Mar 8, 2023Updated 3 years ago
- Dataset and Evaluation Code for the K-QA Benchmark.☆18May 26, 2024Updated last year
- ☆12Feb 11, 2026Updated last month
- Miscellaneous codes and writings for MLOps☆15Updated this week
- CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering☆23Feb 26, 2021Updated 5 years ago
- 자연어 처리 기반 [한글 서술형 수학문제 데이터셋] 공개 저장소입니다.☆14Jun 12, 2023Updated 2 years ago
- ☆25Apr 21, 2021Updated 4 years ago
- huggingface에 있는 한국어 데이터 세트☆36Oct 10, 2024Updated last year
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆19Jun 11, 2025Updated 9 months ago
- An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.☆14Mar 20, 2024Updated 2 years ago
- Implementation of Variational Hierarchical User-based Conversation Model☆10Jul 2, 2021Updated 4 years ago
- Code and Dataset release of "Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models" (NAACL 2024)☆10Oct 16, 2024Updated last year
- An official codebase for "NormLens: Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Comm…☆10May 9, 2024Updated last year
- ☆22Jan 14, 2026Updated 2 months ago
- ☆12Sep 1, 2023Updated 2 years ago
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆54Sep 3, 2024Updated last year
- 🎭 Official code and dataset for our CCGPK@COLING 2022 paper - "PersonaChatGen: Generating Personalized Dialogue using GPT-3"☆13Mar 26, 2024Updated last year
- HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models☆13Mar 6, 2025Updated last year
- This repo is for Korean wiki table question answering datasets described in the paper of Korean-Specific Dataset for Table Question Answe…☆91Oct 22, 2024Updated last year