Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆163Oct 23, 2025Updated 5 months ago
Alternatives and similar repositories for PURE
Users that are interested in PURE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for the ICLR 2025 paper, "Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining"☆29Dec 1, 2024Updated last year
- Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)☆36Oct 15, 2024Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆172Mar 14, 2025Updated last year
- Scalable RL solution for advanced reasoning of language models☆1,841Mar 18, 2025Updated last year
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆25Oct 7, 2025Updated 6 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [TMLR] Process Reward Models That Think☆84Nov 29, 2025Updated 4 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆189May 20, 2025Updated 10 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆124May 6, 2025Updated 11 months ago
- Official Repo for Open-Reasoner-Zero☆2,089Jun 2, 2025Updated 10 months ago
- Command helper for slurm system. Act as if you are on compute node.☆15Feb 1, 2025Updated last year
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Jul 19, 2025Updated 8 months ago
- ☆25Dec 13, 2024Updated last year
- Simple RL training for reasoning☆3,846Dec 23, 2025Updated 3 months ago
- ☆34Oct 31, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,340Updated this week
- ☆1,126Jan 10, 2026Updated 3 months ago
- ☆23Jan 31, 2025Updated last year
- GenRM-CoT: Data release for verification rationales☆67Oct 16, 2024Updated last year
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆260Mar 7, 2026Updated last month
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆249Apr 15, 2025Updated last year
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.☆529Sep 27, 2025Updated 6 months ago
- Recipes to train reward model for RLHF.☆1,527Apr 24, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A comprehensive collection of process reward models.☆145Oct 4, 2025Updated 6 months ago
- Code of LeCoRE☆13Feb 15, 2023Updated 3 years ago
- A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.☆25Oct 22, 2023Updated 2 years ago
- ☆68Nov 26, 2024Updated last year
- A recipe for online RLHF and online iterative DPO.☆543Dec 28, 2024Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆118Aug 5, 2025Updated 8 months ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆286Feb 19, 2025Updated last year
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆96Nov 8, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Unsupervised Natural Language Parsing (Tutorial)☆22Apr 19, 2021Updated 4 years ago
- verl: Volcano Engine Reinforcement Learning for LLMs☆20,603Updated this week
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆437Mar 20, 2026Updated 3 weeks ago
- Policy Optimization is awesome, let’s put a tree on it! 🌲🌟☆22Jul 4, 2025Updated 9 months ago
- A Survey of Reinforcement Learning for Large Reasoning Models☆2,426Nov 9, 2025Updated 5 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆55Nov 29, 2024Updated last year
- ☆323Sep 18, 2024Updated last year