official implementation of paper "Process Reward Model with Q-value Rankings"
☆66Feb 5, 2025Updated last year
Alternatives and similar repositories for Process_Q_Model
Users that are interested in Process_Q_Model are comparing it to the libraries listed below
Sorting:
- Advantage Alignment Algorithms (ICLR 2025 oral)☆17Apr 7, 2025Updated 11 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆170Mar 14, 2025Updated last year
- Agent-RRM: Exploring Reasoning Reward Model for Agents☆53Mar 5, 2026Updated 2 weeks ago
- ☆51Oct 28, 2024Updated last year
- ☆68Nov 26, 2024Updated last year
- Continual Memorization of Factoids in Large Language Models☆12Nov 20, 2024Updated last year
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.☆16Mar 20, 2023Updated 3 years ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆694Jan 20, 2025Updated last year
- ☆12Apr 18, 2025Updated 11 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆117Aug 5, 2025Updated 7 months ago
- [NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality☆20Oct 22, 2025Updated 5 months ago
- Training Proactive and Personalized LLM Agents☆103Jan 20, 2026Updated 2 months ago
- CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment☆22Apr 15, 2022Updated 3 years ago
- ☆321Sep 18, 2024Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Oct 18, 2025Updated 5 months ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- ☆53Feb 19, 2025Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆187May 20, 2025Updated 10 months ago
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.☆88Feb 15, 2025Updated last year
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Mar 1, 2024Updated 2 years ago
- Replicating O1 inference-time scaling laws☆93Dec 1, 2024Updated last year
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- ☆40Jul 14, 2025Updated 8 months ago
- Control LLM generation format efficiently. A simple version of microsoft/aici in vllm and transformers☆14Jun 7, 2024Updated last year
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆264May 5, 2025Updated 10 months ago
- ☆37May 5, 2025Updated 10 months ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆15Oct 16, 2023Updated 2 years ago
- ☆28Nov 10, 2025Updated 4 months ago
- OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models☆1,837Jan 17, 2025Updated last year
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆71Jul 13, 2025Updated 8 months ago
- Source code for SWIFT, an efficient reward model.☆19Jan 13, 2026Updated 2 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆188Jun 25, 2025Updated 8 months ago
- Investigating Cultural Alignment of Large Language Models☆13Aug 14, 2024Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆123May 6, 2025Updated 10 months ago
- ☆33Nov 21, 2025Updated 4 months ago
- ☆158Mar 18, 2023Updated 3 years ago
- ☆30Apr 22, 2025Updated 10 months ago
- ☆46Sep 6, 2025Updated 6 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆202Apr 17, 2025Updated 11 months ago