mukhal / ThinkPRMLinks
Process Reward Models That Think
☆38Updated this week
Alternatives and similar repositories for ThinkPRM
Users that are interested in ThinkPRM are comparing it to the libraries listed below
Sorting:
- ☆29Updated 2 weeks ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated 3 weeks ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆88Updated last week
- Repo for "Z1: Efficient Test-time Scaling with Code"☆59Updated last month
- ☆49Updated 3 weeks ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated 2 weeks ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- ☆45Updated 3 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆59Updated 3 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆15Updated 3 weeks ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆102Updated 4 months ago
- ☆46Updated 3 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆36Updated 3 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆25Updated 2 months ago
- Official repo of paper LM2☆40Updated 3 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆32Updated 2 months ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 8 months ago
- Revisiting Mid-training in the Era of RL Scaling☆48Updated last month
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆70Updated 2 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆117Updated this week
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆151Updated last month
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆54Updated 8 months ago
- ☆32Updated 3 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆55Updated 3 months ago
- A Sober Look at Language Model Reasoning☆52Updated this week
- ☆113Updated 4 months ago
- Verifiers for LLM Reinforcement Learning☆55Updated last month
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 6 months ago
- ☆19Updated 3 weeks ago
- ☆27Updated last month