[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆66Mar 30, 2026Updated 3 weeks ago
Alternatives and similar repositories for RPG
Users that are interested in RPG are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆54Jul 15, 2025Updated 9 months ago
- ☆11May 18, 2025Updated 11 months ago
- Official implementation of TBA for async LLM post-training.☆30Nov 5, 2025Updated 5 months ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Oct 3, 2025Updated 6 months ago
- ☆16Feb 22, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆43Oct 31, 2025Updated 5 months ago
- ICML 2025 Spotlight, PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative AP…☆14Jun 27, 2025Updated 10 months ago
- ☆16Apr 20, 2018Updated 8 years ago
- SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)☆17Aug 22, 2025Updated 8 months ago
- ☆15Feb 26, 2025Updated last year
- Auditing agents for fine-tuning safety☆20Oct 21, 2025Updated 6 months ago
- Official Repo for Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics☆73Mar 26, 2026Updated last month
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆61Nov 11, 2025Updated 5 months ago
- ☆18Oct 26, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆650Jan 29, 2026Updated 3 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆29Mar 1, 2025Updated last year
- Historical and operational core of the OMNIA diagnostics lineage inside the OMNIABASE ecosystem.☆74Apr 20, 2026Updated last week
- [ICML 2025] Repository for M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture☆27Mar 13, 2026Updated last month
- ☆46Nov 1, 2025Updated 5 months ago
- [ICML 2025] Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search☆111Jun 3, 2025Updated 10 months ago
- Safe SLAC, an algorithm for safe cost-constrained reinforcement learning in high-dimensional POMDPs.☆11Mar 1, 2023Updated 3 years ago
- First neural GPT aligned with text and speech. Welcome to join us to make better foundation model in neural modality.☆14Oct 30, 2024Updated last year
- ☆16Sep 25, 2025Updated 7 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆28Aug 19, 2024Updated last year
- ☆19Aug 4, 2025Updated 8 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆143Dec 17, 2025Updated 4 months ago
- Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".☆12Dec 27, 2023Updated 2 years ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆62Aug 30, 2024Updated last year
- This repository is associated with the research paper titled ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large…☆15Jun 4, 2025Updated 10 months ago
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆26Mar 6, 2026Updated last month
- ☆359Jul 29, 2025Updated 9 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆33Apr 1, 2025Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Code space for L4DC paper "State-wise Safe Reinforcement Learning With Pixel Observations"☆11Apr 5, 2024Updated 2 years ago
- code repo for paper accepted in ICML 2023☆14Oct 19, 2023Updated 2 years ago
- This is the official implementation for paper "On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond".☆21Nov 17, 2025Updated 5 months ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- Code for computing the hidden biases in deep networks and its applications☆14Feb 23, 2023Updated 3 years ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆426Mar 11, 2026Updated last month
- [ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a Pretraining Objective☆247Jan 26, 2026Updated 3 months ago