Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆160Oct 23, 2025Updated 5 months ago
Alternatives and similar repositories for PURE
Users that are interested in PURE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for the ICLR 2025 paper, "Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining"☆29Dec 1, 2024Updated last year
- Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)☆36Oct 15, 2024Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆170Mar 14, 2025Updated last year
- Scalable RL solution for advanced reasoning of language models☆1,821Mar 18, 2025Updated last year
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆25Oct 7, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- [TMLR] Process Reward Models That Think☆82Nov 29, 2025Updated 3 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆187May 20, 2025Updated 10 months ago
- Official Repo for Open-Reasoner-Zero☆2,086Jun 2, 2025Updated 9 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆123May 6, 2025Updated 10 months ago
- Command helper for slurm system. Act as if you are on compute node.☆15Feb 1, 2025Updated last year
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Jul 19, 2025Updated 8 months ago
- ☆25Dec 13, 2024Updated last year
- Simple RL training for reasoning☆3,841Dec 23, 2025Updated 3 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆9,231Updated this week
- ☆34Oct 31, 2024Updated last year
- ☆1,113Jan 10, 2026Updated 2 months ago
- ☆23Jan 31, 2025Updated last year
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆260Mar 7, 2026Updated 2 weeks ago
- GenRM-CoT: Data release for verification rationales☆67Oct 16, 2024Updated last year
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆249Apr 15, 2025Updated 11 months ago
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.☆524Sep 27, 2025Updated 6 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A comprehensive collection of process reward models.☆141Oct 4, 2025Updated 5 months ago
- Code of LeCoRE☆13Feb 15, 2023Updated 3 years ago
- A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.☆25Oct 22, 2023Updated 2 years ago
- A recipe for online RLHF and online iterative DPO.☆545Dec 28, 2024Updated last year
- ☆68Nov 26, 2024Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆94Nov 8, 2025Updated 4 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆117Aug 5, 2025Updated 7 months ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆284Feb 19, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- verl: Volcano Engine Reinforcement Learning for LLMs☆20,097Updated this week
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆426Mar 20, 2026Updated last week
- Unsupervised Natural Language Parsing (Tutorial)☆22Apr 19, 2021Updated 4 years ago
- Policy Optimization is awesome, let’s put a tree on it! 🌲🌟☆22Jul 4, 2025Updated 8 months ago
- A Survey of Reinforcement Learning for Large Reasoning Models☆2,393Nov 9, 2025Updated 4 months ago
- ☆321Sep 18, 2024Updated last year
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆55Nov 29, 2024Updated last year