official implementation of paper "Process Reward Model with Q-value Rankings"
☆66Feb 5, 2025Updated last year
Alternatives and similar repositories for Process_Q_Model
Users that are interested in Process_Q_Model are comparing it to the libraries listed below
Sorting:
- ☆51Oct 28, 2024Updated last year
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆692Jan 20, 2025Updated last year
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.☆16Mar 20, 2023Updated 2 years ago
- Advantage Alignment Algorithms (ICLR 2025 oral)☆17Apr 7, 2025Updated 10 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆169Mar 14, 2025Updated 11 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆116Aug 5, 2025Updated 6 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Oct 18, 2025Updated 4 months ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- ☆29Jan 23, 2024Updated 2 years ago
- ☆33Nov 21, 2025Updated 3 months ago
- A DSPy Adapter for exact-fidelity prompt templates with full control over messages.☆31Feb 13, 2026Updated 2 weeks ago
- Agent-RRM: Exploring Reasoning Reward Model for Agents☆44Feb 4, 2026Updated 3 weeks ago
- ☆68Nov 26, 2024Updated last year
- ☆53Feb 19, 2025Updated last year
- ☆321Sep 18, 2024Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆184May 20, 2025Updated 9 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆45Oct 1, 2025Updated 5 months ago
- Repo for ACL2023 paper "Won't Get Fooled Again: Answering Questions with False Premises"☆22Jun 11, 2023Updated 2 years ago
- Implementation of a state-of-art algorithm from the paper “Learning with Noisy Labels” , which is the first one providing “guarantees for…☆21Mar 8, 2018Updated 7 years ago
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.☆88Feb 15, 2025Updated last year
- ☆19Mar 3, 2025Updated 11 months ago
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆53Aug 10, 2025Updated 6 months ago
- Replicating O1 inference-time scaling laws☆93Dec 1, 2024Updated last year
- ☆22Jun 15, 2023Updated 2 years ago
- a Renju game, replicate paper "Mastering the game of Go with deep neural networks and tree search"☆20Jun 29, 2016Updated 9 years ago
- The jiant toolkit for general-purpose text understanding models☆22Oct 8, 2020Updated 5 years ago
- ☆264May 14, 2025Updated 9 months ago
- ☆28Nov 10, 2025Updated 3 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆261May 5, 2025Updated 9 months ago
- Code for our EMNLP 2020 paper "Uncertainty-Aware Label Refinement for Sequence Labeling"☆22Oct 4, 2020Updated 5 years ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆123May 6, 2025Updated 9 months ago
- ☆158Mar 18, 2023Updated 2 years ago
- Software Engineering Back End Microservices Project☆15Nov 20, 2024Updated last year
- This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.☆567Oct 28, 2024Updated last year
- Directional Preference Alignment☆58Sep 23, 2024Updated last year
- Self-Alignment with Principle-Following Reward Models☆169Sep 18, 2025Updated 5 months ago
- A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical …☆55Sep 1, 2025Updated 6 months ago
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Mar 1, 2024Updated 2 years ago