Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆158Oct 23, 2025Updated 4 months ago
Alternatives and similar repositories for PURE
Users that are interested in PURE are comparing it to the libraries listed below
Sorting:
- Official code for the ICLR 2025 paper, "Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining"☆28Dec 1, 2024Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆169Mar 14, 2025Updated 11 months ago
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Jul 19, 2025Updated 7 months ago
- Scalable RL solution for advanced reasoning of language models☆1,811Mar 18, 2025Updated 11 months ago
- Command helper for slurm system. Act as if you are on compute node.☆15Feb 1, 2025Updated last year
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆25Oct 7, 2025Updated 5 months ago
- Process Reward Models That Think☆80Nov 29, 2025Updated 3 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆123May 6, 2025Updated 10 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆184May 20, 2025Updated 9 months ago
- Official Repo for Open-Reasoner-Zero☆2,087Jun 2, 2025Updated 9 months ago
- Simple RL training for reasoning☆3,830Dec 23, 2025Updated 2 months ago
- SCoRe: Training Language Models to Self-Correct via Reinforcement Learning☆16Jan 24, 2025Updated last year
- ☆33Oct 31, 2024Updated last year
- ☆25Dec 13, 2024Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated 9 months ago
- A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.☆25Oct 22, 2023Updated 2 years ago
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆9,084Updated this week
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year
- Policy Optimization is awesome, let’s put a tree on it! 🌲🌟☆22Jul 4, 2025Updated 8 months ago
- ☆1,104Jan 10, 2026Updated last month
- Unsupervised Natural Language Parsing (Tutorial)☆22Apr 19, 2021Updated 4 years ago
- [NeurIPS 2025 Spotlight] LLM post-training suite for long-CoT reasoning, PRM, and code generation — featuring ReasonFlux, ReasonFlux-PRM,…☆521Sep 27, 2025Updated 5 months ago
- ☆265May 14, 2025Updated 9 months ago
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆31Aug 18, 2024Updated last year
- Code of LeCoRE☆13Feb 15, 2023Updated 3 years ago
- A recipe for online RLHF and online iterative DPO.☆540Dec 28, 2024Updated last year
- Recipes to train reward model for RLHF.☆1,517Apr 24, 2025Updated 10 months ago
- A comprehensive collection of process reward models.☆138Oct 4, 2025Updated 5 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆249Apr 15, 2025Updated 10 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆417Oct 4, 2025Updated 5 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Codebase for Iterative DPO Using Rule-based Rewards☆269Apr 11, 2025Updated 10 months ago
- Code for the ICCV 2023 paper "Benchmarking Low-Shot Robustness to Natural Distribution Shifts"☆11Jan 21, 2024Updated 2 years ago
- Code for Research Project TLDR☆25Jul 28, 2025Updated 7 months ago
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"☆11Jan 10, 2025Updated last year
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- Official repository for "Investigating Pre-Training Objectives for Generalization in Visual Reinforcement Learning" (ICML 2024)☆11Sep 16, 2025Updated 5 months ago
- R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning☆29Feb 9, 2026Updated 3 weeks ago
- Efficiently creating diverse multi-turn Text-to-SQL training samples in just 3 steps! 🚀☆14Jan 31, 2026Updated last month