CJReinforce/PURE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CJReinforce/PURE)

CJReinforce / PURE

Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"

☆168

Alternatives and similar repositories for PURE

Users that are interested in PURE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CJReinforce / JOWA
View on GitHub
Official code for the ICLR 2025 paper, "Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining"
☆29Dec 1, 2024Updated last year
CJReinforce / RIME_ICML2024
View on GitHub
Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)
☆35Oct 15, 2024Updated last year
PRIME-RL / ImplicitPRM
View on GitHub
Repo of paper "Free Process Rewards without Process Labels"
☆171Mar 14, 2025Updated last year
PRIME-RL / PRIME
View on GitHub
Scalable RL solution for advanced reasoning of language models
☆1,859Mar 18, 2025Updated last year
hkust-nlp / RL-Verifier-Robustness
View on GitHub
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
☆25Oct 7, 2025Updated 7 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mukhal / ThinkPRM
View on GitHub
[TMLR] Process Reward Models That Think
☆89Nov 29, 2025Updated 6 months ago
QwenLM / ProcessBench
View on GitHub
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆190May 20, 2025Updated last year
HKUNLP / critic-rl
View on GitHub
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆126May 6, 2025Updated last year
Open-Reasoner-Zero / Open-Reasoner-Zero
View on GitHub
Official Repo for Open-Reasoner-Zero
☆2,091Jun 2, 2025Updated 11 months ago
why-in-Shanghaitech / sapp
View on GitHub
Command helper for slurm system. Act as if you are on compute node.
☆16Feb 1, 2025Updated last year
GaryStack / Trustworthy-Evaluation
View on GitHub
Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)
☆19Jul 19, 2025Updated 10 months ago
HypherX / Evolution-Analysis
View on GitHub
☆25Dec 13, 2024Updated last year
hkust-nlp / simpleRL-reason
View on GitHub
Simple RL training for reasoning
☆3,859Dec 23, 2025Updated 5 months ago
hkust-nlp / B-STaR
View on GitHub
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆86May 21, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
cmu-mind / RISE
View on GitHub
☆34Oct 31, 2024Updated last year
OpenRLHF / OpenRLHF
View on GitHub
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…
☆9,548Updated this week
huggingface / Math-Verify
View on GitHub
☆1,144Jan 10, 2026Updated 4 months ago
aadityasingh / HARP
View on GitHub
☆22Jan 31, 2025Updated last year
genrm-star / genrm-critiques
View on GitHub
GenRM-CoT: Data release for verification rationales
☆68Oct 16, 2024Updated last year
Hongcheng-Gao / Awesome-Long2short-on-LRMs
View on GitHub
Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…
☆260Mar 7, 2026Updated 2 months ago
sail-sg / oat-zero
View on GitHub
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
☆250Apr 15, 2025Updated last year
TomSheng21 / AdaptGuard
View on GitHub
ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation
☆11Dec 23, 2023Updated 2 years ago
Gen-Verse / ReasonFlux
View on GitHub
[NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.
☆535Sep 27, 2025Updated 8 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
RLHFlow / RLHF-Reward-Modeling
View on GitHub
Recipes to train reward model for RLHF.
☆1,531Apr 24, 2025Updated last year
RyanLiu112 / Awesome-Process-Reward-Models
View on GitHub
A comprehensive collection of process reward models.
☆158Oct 4, 2025Updated 7 months ago
whyNLP / Probabilistic-Transformer
View on GitHub
A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.
☆25Oct 22, 2023Updated 2 years ago
SkyworkAI / skywork-o1-prm-inference
View on GitHub
☆68Nov 26, 2024Updated last year
RLHFlow / Online-RLHF
View on GitHub
A recipe for online RLHF and online iterative DPO.
☆545Dec 28, 2024Updated last year
likenneth / q_probe
View on GitHub
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
☆40Jun 10, 2024Updated last year
RyanLiu112 / compute-optimal-tts
View on GitHub
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
☆288Feb 19, 2025Updated last year
CMU-AIRe / MRT
View on GitHub
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆118Aug 5, 2025Updated 9 months ago
RyanLiu112 / GenPRM
View on GitHub
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆100Nov 8, 2025Updated 6 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
tukw / unsupervised-parsing-tutorial
View on GitHub
Unsupervised Natural Language Parsing (Tutorial)
☆22Apr 19, 2021Updated 5 years ago
1ring2rta / MCTS-GRPO
View on GitHub
Policy Optimization is awesome, let’s put a tree on it! 🌲🌟
☆22Jul 4, 2025Updated 10 months ago
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆21,514Updated this week
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆451Mar 20, 2026Updated 2 months ago
TsinghuaC3I / Awesome-RL-for-LRMs
View on GitHub
A Survey of Reinforcement Learning for Large Reasoning Models
☆2,459Nov 9, 2025Updated 6 months ago
WooooDyy / MathCritique
View on GitHub
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆55Nov 29, 2024Updated last year
OpenBMB / Eurus
View on GitHub
☆324Sep 18, 2024Updated last year