Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆167Oct 23, 2025Updated 6 months ago
Alternatives and similar repositories for PURE
Users that are interested in PURE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for the ICLR 2025 paper, "Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining"☆29Dec 1, 2024Updated last year
- Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)☆37Oct 15, 2024Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆171Mar 14, 2025Updated last year
- Scalable RL solution for advanced reasoning of language models☆1,852Mar 18, 2025Updated last year
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆25Oct 7, 2025Updated 7 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [TMLR] Process Reward Models That Think☆87Nov 29, 2025Updated 5 months ago
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆190May 20, 2025Updated 11 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆126May 6, 2025Updated last year
- Official Repo for Open-Reasoner-Zero☆2,093Jun 2, 2025Updated 11 months ago
- Command helper for slurm system. Act as if you are on compute node.☆16Feb 1, 2025Updated last year
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Jul 19, 2025Updated 9 months ago
- ☆25Dec 13, 2024Updated last year
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated 11 months ago
- ☆34Oct 31, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,441Updated this week
- ☆1,135Jan 10, 2026Updated 3 months ago
- ☆23Jan 31, 2025Updated last year
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆260Mar 7, 2026Updated last month
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆249Apr 15, 2025Updated last year
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.☆533Sep 27, 2025Updated 7 months ago
- Recipes to train reward model for RLHF.☆1,531Apr 24, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A comprehensive collection of process reward models.☆151Oct 4, 2025Updated 7 months ago
- Code of LeCoRE☆13Feb 15, 2023Updated 3 years ago
- A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.☆25Oct 22, 2023Updated 2 years ago
- A recipe for online RLHF and online iterative DPO.☆544Dec 28, 2024Updated last year
- ☆68Nov 26, 2024Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated last year
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆287Feb 19, 2025Updated last year
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆118Aug 5, 2025Updated 9 months ago
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆98Nov 8, 2025Updated 5 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Unsupervised Natural Language Parsing (Tutorial)☆22Apr 19, 2021Updated 5 years ago
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆21,046Updated this week
- Policy Optimization is awesome, let’s put a tree on it! 🌲🌟☆22Jul 4, 2025Updated 10 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆444Mar 20, 2026Updated last month
- A Survey of Reinforcement Learning for Large Reasoning Models☆2,449Nov 9, 2025Updated 5 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆55Nov 29, 2024Updated last year
- ☆323Sep 18, 2024Updated last year