CLAIRE-Labo / quantile-reward-policy-optimizationLinks
Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok et al. 2025).
☆28Updated last week
Alternatives and similar repositories for quantile-reward-policy-optimization
Users that are interested in quantile-reward-policy-optimization are comparing it to the libraries listed below
Sorting:
- ☆125Updated 9 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆189Updated 9 months ago
- Official Code Release for "Training a Generally Curious Agent"☆39Updated 7 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆111Updated 8 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆112Updated 4 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆136Updated 7 months ago
- Prune transformer layers☆74Updated last year
- nanoGPT-like codebase for LLM training☆113Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 11 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆86Updated last year
- ☆144Updated 3 months ago
- Replicating O1 inference-time scaling laws☆91Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆234Updated 5 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆179Updated 5 months ago
- minimal GRPO implementation from scratch☆100Updated 9 months ago
- Reinforcing General Reasoning without Verifiers☆92Updated 5 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆149Updated 10 months ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆277Updated 3 weeks ago
- ☆89Updated last year
- NeurIPS 2024 tutorial on LLM Inference☆47Updated last year
- ☆100Updated last year
- ☆88Updated last week
- Official repo of paper LM2☆46Updated 10 months ago
- Simple repository for training small reasoning models☆47Updated 10 months ago
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆55Updated 5 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆84Updated 9 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆58Updated 2 months ago
- ☆52Updated 9 months ago
- ☆55Updated last year
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆67Updated 7 months ago