Zhou-Zoey/RMB-Reward-Model-Benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Zhou-Zoey/RMB-Reward-Model-Benchmark)

Zhou-Zoey / RMB-Reward-Model-Benchmark

☆48

Alternatives and similar repositories for RMB-Reward-Model-Benchmark

Users that are interested in RMB-Reward-Model-Benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lmarena / PPE
View on GitHub
☆65May 13, 2025Updated last year
QwenLM / ProcessBench
View on GitHub
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆190May 20, 2025Updated last year
cassidylaidlaw / hidden-context
View on GitHub
Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"
☆35Dec 14, 2023Updated 2 years ago
RM-R1-UIUC / RM-R1
View on GitHub
[ICLR'26] RM-R1: Unleashing the Reasoning Potential of Reward Models
☆167Jun 26, 2025Updated last year
PKU-ONELab / Themis
View on GitHub
The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretab…
☆21Feb 23, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Shentao-YANG / Preference_Grounded_Guidance
View on GitHub
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆17Jan 8, 2025Updated last year
jinpz / q_sharp
View on GitHub
The official code release for Q#: Provably Optimal Distributional RL for LLM Post-Training
☆20Mar 4, 2025Updated last year
InternLM / POLAR
View on GitHub
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
☆166Sep 23, 2025Updated 10 months ago
WooooDyy / BAPO
View on GitHub
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…
☆94Jan 29, 2026Updated 5 months ago
TeunvdWeij / sandbagging
View on GitHub
☆20Nov 15, 2024Updated last year
liziniu / cold_start_rl
View on GitHub
Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?
☆20Mar 9, 2025Updated last year
FranxYao / Complexity-Based-Prompting
View on GitHub
Complexity Based Prompting for Multi-Step Reasoning
☆17Mar 10, 2023Updated 3 years ago
yuh-zha / Align
View on GitHub
Align, a general text alignment function
☆15Dec 7, 2023Updated 2 years ago
JackShDr / InfluentialRS
View on GitHub
Implementations of Influential Recommender System
☆12Oct 29, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
facebookresearch / multimodal_rewardbench
View on GitHub
Multimodal RewardBench
☆68Feb 21, 2025Updated last year
tuhinjubcse / SimileGeneration-EMNLP2020
View on GitHub
Code for SCOPE (Style transfer through COmmonsense PropErty) , a style transfer approach to convert literal sentences to similes
☆19Apr 18, 2021Updated 5 years ago
icip-cas / Verifier-Engineering
View on GitHub
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
☆63Dec 5, 2024Updated last year
feiyang-k / AutoScale
View on GitHub
Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…
☆14Aug 8, 2025Updated 11 months ago
InternLM / SWE-Fixer
View on GitHub
☆139May 8, 2025Updated last year
baixianghuang / HalluEditBench
View on GitHub
Can Knowledge Editing Really Correct Hallucinations? (ICLR 2025)
☆26Aug 10, 2025Updated 11 months ago
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
meituan / vitabench
View on GitHub
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
☆23Oct 17, 2025Updated 9 months ago
morning9393 / ETPO
View on GitHub
☆14Mar 5, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
xiusic / MinPrompt
View on GitHub
MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering
☆14May 3, 2024Updated 2 years ago
qinliu9 / Flooding-X
View on GitHub
☆14Jul 13, 2022Updated 4 years ago
UKPLab / acl2025-diverse-cot
View on GitHub
Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"
☆32Jun 25, 2025Updated last year
zankner / CLoud
View on GitHub
Critique-out-Loud Reward Models
☆76Oct 18, 2024Updated last year
mlwu22 / RED
View on GitHub
Implementation code for ACL2024：Advancing Parameter Efficiency in Fine-tuning via Representation Editing
☆15Apr 20, 2024Updated 2 years ago
Wizardcoast / Linear_Alignment
View on GitHub
This repo is reproduction resources for linear alignment paper, still working
☆17May 19, 2024Updated 2 years ago
wangyu-ustc / LargeScaleWashing
View on GitHub
The official implementation of the paper "Large Scale Knowledge Washing"
☆10Jun 12, 2024Updated 2 years ago
AsaCooperStickland / situational-awareness-evals
View on GitHub
Measuring the situational awareness of language models
☆41Feb 12, 2024Updated 2 years ago
Red-Hat-AI-Innovation-Team / SQuat
View on GitHub
☆22Jun 5, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
CriticBench / CriticBench
View on GitHub
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆31Mar 5, 2024Updated 2 years ago
jiangshdd / ReviewCritique
View on GitHub
☆13Sep 26, 2024Updated last year
PremiLab-Math / MathCheck
View on GitHub
[ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
☆34Oct 23, 2024Updated last year
anishmadan23 / MAML_Pytorch_RL
View on GitHub
☆10Aug 8, 2021Updated 4 years ago
Junjie-Ye / ToolEyes
View on GitHub
[COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
☆74May 13, 2025Updated last year
choidami / inductive-oocr
View on GitHub
☆16Mar 22, 2025Updated last year
successar / instance_attributions_NLP
View on GitHub
☆16Apr 14, 2021Updated 5 years ago