Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆168Oct 23, 2025Updated 7 months ago
Alternatives and similar repositories for PURE
Users that are interested in PURE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)☆35Oct 15, 2024Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆171Mar 14, 2025Updated last year
- Scalable RL solution for advanced reasoning of language models☆1,862Mar 18, 2025Updated last year
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆25Oct 7, 2025Updated 8 months ago
- [TMLR] Process Reward Models That Think☆89Nov 29, 2025Updated 6 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆192May 20, 2025Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆126May 6, 2025Updated last year
- Official Repo for Open-Reasoner-Zero☆2,097Jun 2, 2025Updated last year
- Command helper for slurm system. Act as if you are on compute node.☆16Feb 1, 2025Updated last year
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Jul 19, 2025Updated 11 months ago
- ☆25Dec 13, 2024Updated last year
- Simple RL training for reasoning☆3,864Dec 23, 2025Updated 5 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆86May 21, 2025Updated last year
- ☆34Oct 31, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Asy…☆9,652Jun 9, 2026Updated last week
- ☆1,152Jan 10, 2026Updated 5 months ago
- ☆22Jan 31, 2025Updated last year
- GenRM-CoT: Data release for verification rationales☆68Oct 16, 2024Updated last year
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆260Mar 7, 2026Updated 3 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆250Apr 15, 2025Updated last year
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- [NeurIPS 2025 Spotlight] LLM post-training suite — featuring ReasonFlux, ReasonFlux-PRM, and ReasonFlux-Coder.☆538Sep 27, 2025Updated 8 months ago
- Recipes to train reward model for RLHF.☆1,534Apr 24, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A comprehensive collection of process reward models.☆167Jun 6, 2026Updated last week
- Code of LeCoRE☆13Feb 15, 2023Updated 3 years ago
- A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.☆25Oct 22, 2023Updated 2 years ago
- ☆68Nov 26, 2024Updated last year
- A recipe for online RLHF and online iterative DPO.☆544Dec 28, 2024Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Jun 10, 2024Updated 2 years ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆288Feb 19, 2025Updated last year
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆100Nov 8, 2025Updated 7 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆119Aug 5, 2025Updated 10 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Policy Optimization is awesome, let’s put a tree on it! 🌲🌟☆22Jul 4, 2025Updated 11 months ago
- verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework☆21,969Updated this week
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆452Mar 20, 2026Updated 2 months ago
- A Survey of Reinforcement Learning for Large Reasoning Models☆2,466Nov 9, 2025Updated 7 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆55Nov 29, 2024Updated last year
- ☆323Sep 18, 2024Updated last year
- Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?☆20Mar 9, 2025Updated last year