CLAIRE-Labo / quantile-reward-policy-optimizationLinks
Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok et al. 2025).
☆27Updated 3 weeks ago
Alternatives and similar repositories for quantile-reward-policy-optimization
Users that are interested in quantile-reward-policy-optimization are comparing it to the libraries listed below
Sorting:
- ☆124Updated 9 months ago
- Official Code Release for "Training a Generally Curious Agent"☆38Updated 6 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆85Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 10 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆189Updated 8 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆106Updated this week
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆67Updated 7 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆76Updated last year
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆261Updated this week
- nanoGPT-like codebase for LLM training☆110Updated 3 weeks ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆125Updated 6 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆106Updated 7 months ago
- Prune transformer layers☆74Updated last year
- ☆88Updated last year
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆112Updated 4 months ago
- Reinforcing General Reasoning without Verifiers☆92Updated 5 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆151Updated 9 months ago
- NeurIPS 2024 tutorial on LLM Inference☆47Updated 11 months ago
- accompanying material for sleep-time compute paper☆117Updated 6 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆173Updated 5 months ago
- ☆80Updated last month
- ☆35Updated 6 months ago
- ☆55Updated last year
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆145Updated 9 months ago
- ☆29Updated 3 weeks ago
- Replicating O1 inference-time scaling laws☆90Updated 11 months ago
- ☆106Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆92Updated last year
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆56Updated 4 months ago