[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆75May 13, 2026Updated last week
Alternatives and similar repositories for RPG
Users that are interested in RPG are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)☆41Sep 8, 2025Updated 8 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆55Jul 15, 2025Updated 10 months ago
- ☆11May 18, 2025Updated last year
- Official repo of Progressive Data Expansion: data, code and evaluation☆29Nov 16, 2023Updated 2 years ago
- Don't just regulate gradients like in Muon, regulate the weights too☆32Jul 30, 2025Updated 9 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Official implementation of TBA for async LLM post-training.☆31Nov 5, 2025Updated 6 months ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Oct 3, 2025Updated 7 months ago
- ☆16Feb 22, 2025Updated last year
- RENT (Reinforcement Learning via Entropy Minimization) is an unsupervised method for training reasoning LLMs.☆43Oct 31, 2025Updated 6 months ago
- ICML 2025 Spotlight, PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative AP…☆14Jun 27, 2025Updated 10 months ago
- ☆16Apr 20, 2018Updated 8 years ago
- SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)☆17Aug 22, 2025Updated 8 months ago
- ☆15Feb 26, 2025Updated last year
- Auditing agents for fine-tuning safety☆21Oct 21, 2025Updated 6 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆64Nov 11, 2025Updated 6 months ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆654Jan 29, 2026Updated 3 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆29Mar 1, 2025Updated last year
- Canonical public hub for the MB-X.01 / OMNIA ecosystem, lineage, boundaries, and navigation.☆74May 9, 2026Updated last week
- ☆47Nov 1, 2025Updated 6 months ago
- Sample project to build and run Turso's SQLite fork on iOS and use vector search functionality on device☆15Jul 26, 2024Updated last year
- [ICML 2025] Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search☆113Jun 3, 2025Updated 11 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆104Apr 21, 2026Updated 3 weeks ago
- Safe SLAC, an algorithm for safe cost-constrained reinforcement learning in high-dimensional POMDPs.☆11Mar 1, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- OpenTinker is an RL-as-a-Service infrastructure for foundation models☆672Mar 21, 2026Updated last month
- ☆16Sep 25, 2025Updated 7 months ago
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆28Aug 19, 2024Updated last year
- ☆19Aug 4, 2025Updated 9 months ago
- Tool to add/update nftables cgroupv2 rules for systemd-managed unit cgroups (slices, services, scopes)☆16May 5, 2025Updated last year
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆143Dec 17, 2025Updated 5 months ago
- Code for EMNLP2023 paper "MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter".☆12Dec 27, 2023Updated 2 years ago
- This repository is associated with the research paper titled ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large…☆15Jun 4, 2025Updated 11 months ago
- The source code for the paper: Yirong Mao, Ruiping Wang, Shiguang Shan, Xilin Chen. COSONet: Compact Second-Order Network for Video Face …☆12Dec 27, 2018Updated 7 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆19Jul 24, 2025Updated 9 months ago
- ☆358Jul 29, 2025Updated 9 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆34Apr 1, 2025Updated last year
- Documentation at☆14Mar 27, 2025Updated last year
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- ☆15Mar 30, 2025Updated last year
- Code for computing the hidden biases in deep networks and its applications☆14Feb 23, 2023Updated 3 years ago