CLAIRE-Labo / quantile-reward-policy-optimizationLinks
Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok et al. 2025).
☆20Updated last week
Alternatives and similar repositories for quantile-reward-policy-optimization
Users that are interested in quantile-reward-policy-optimization are comparing it to the libraries listed below
Sorting:
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆75Updated this week
- Train your own SOTA deductive reasoning model☆100Updated 4 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 6 months ago
- ☆117Updated 5 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 4 months ago
- ☆23Updated last month
- Prune transformer layers☆69Updated last year
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆35Updated last week
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.