tajwarfahim / maxrlLinks
Official Implementation of "Maximum Likelihood Reinforcement Learning (MaxRL)"
☆67Updated this week
Alternatives and similar repositories for maxrl
Users that are interested in maxrl are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)☆65Updated 2 weeks ago
- Official repo of paper LM2☆46Updated 11 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆221Updated 3 months ago
- An official implementation of Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards☆36Updated 4 months ago
- Measuring the Signal to Noise Ratio in Language Model Evaluation☆28Updated 5 months ago
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆175Updated 4 months ago
- ☆88Updated last year
- Bayes-Adaptive RL for LLM Reasoning☆45Updated 8 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆143Updated 9 months ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆56Updated last year
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆120Updated last month
- Esoteric Language Models☆111Updated this week
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆130Updated 2 months ago
- ☆143Updated 2 months ago
- Reinforcing General Reasoning without Verifiers☆96Updated 7 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆27Updated 11 months ago
- P1: Mastering Physics Olympiads with Reinforcement Learning☆73Updated last month
- [Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments☆177Updated 3 weeks ago
- Defeating the Training-Inference Mismatch via FP16☆181Updated 2 months ago
- ☆33Updated last year
- Official Repository of Native Parallel Reasoner☆100Updated this week
- A Practitioner's Guide to M(eow)ti Turn Agentic ReinfOrcement learning☆75Updated 3 weeks ago
- ☆70Updated last year
- ☆111Updated 4 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Updated last year
- ☆118Updated 10 months ago
- Natural Language Reinforcement Learning☆101Updated 6 months ago
- The official code release for Q#: Provably Optimal Distributional RL for LLM Post-Training☆18Updated 11 months ago
- ☆19Updated 10 months ago
- ☆67Updated 11 months ago