princeton-pli/RLMT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/princeton-pli/RLMT)

princeton-pli / RLMT

[R]einforcement [L]earning from [M]odel-rewarded [T]hinking - code for the paper "Language Models That Think, Chat Better"

☆128

Alternatives and similar repositories for RLMT

Users that are interested in RLMT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

viswavi / RLCF
View on GitHub
☆24Oct 23, 2025Updated 8 months ago
Jun-Kai-Zhang / rubrics
View on GitHub
The official code repo of paper "Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training"
☆30Feb 20, 2026Updated 5 months ago
Simplified-Reasoning / LUFFY
View on GitHub
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆459Mar 20, 2026Updated 4 months ago
princeton-pli / PruLong
View on GitHub
Code for the preprint "Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?"
☆48Jul 29, 2025Updated 11 months ago
zhangxy-2019 / critique-GRPO
View on GitHub
[ICML 2026 Spotlight] Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback
☆70Jun 3, 2026Updated last month
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
TIGER-AI-Lab / General-Reasoner
View on GitHub
General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]
☆228Nov 27, 2025Updated 7 months ago
microsoft / experiential_rl
View on GitHub
The official codebase for "Experiential Reinforcement Learning" - https://arxiv.org/pdf/2602.13949v1
☆75Jul 2, 2026Updated 2 weeks ago
cvenhoff / thinking-llms-interp
View on GitHub
☆25Jul 8, 2026Updated last week
howard-yen / SLIM
View on GitHub
☆27Jun 22, 2026Updated last month
IANNXANG / RuscaRL
View on GitHub
☆48Jan 30, 2026Updated 5 months ago
rdi-berkeley / awesome-RLVR-boundary
View on GitHub
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Langu…
☆89Dec 12, 2025Updated 7 months ago
sail-sg / feedback-conditional-policy
View on GitHub
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
☆65Jan 5, 2026Updated 6 months ago
tianyi-lab / RoMA
View on GitHub
Code for "Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs"
☆19Nov 6, 2025Updated 8 months ago
ShopeeLLM / Spec-RL
View on GitHub
SPEC-RL: Accelerating On-Policy Reinforcement Learning via Speculative Rollouts
☆66Dec 1, 2025Updated 7 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
brucewlee / self-incrimination
View on GitHub
Code used for "Training Agents to Self-Report Misbehavior"
☆18Feb 27, 2026Updated 4 months ago
QwenLM / RationaleRM
View on GitHub
☆34Mar 18, 2026Updated 4 months ago
gszfwsb / AutoGnothi
View on GitHub
Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"
☆23Mar 4, 2025Updated last year
ruixin31 / Spurious_Rewards
View on GitHub
☆361Jul 29, 2025Updated 11 months ago
princeton-nlp / PTP
View on GitHub
Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073
☆32Jul 9, 2024Updated 2 years ago
RM-R1-UIUC / RM-R1
View on GitHub
[ICLR'26] RM-R1: Unleashing the Reasoning Potential of Reward Models
☆167Jun 26, 2025Updated last year
uestc-db / Unsupervised-Entity-Resolution
View on GitHub
code for unsupervised entity resolution
☆10Apr 26, 2019Updated 7 years ago
ibisbill / Transferability-of-LLM-Reasoning
View on GitHub
☆111Jul 6, 2026Updated 2 weeks ago
uw-nsl / TinyV
View on GitHub
Your efficient and accurate answer verification system for RL training.
☆42Jun 23, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
lili-chen / self-questioning-lm
View on GitHub
Self-Questioning Language Models
☆57Mar 30, 2026Updated 3 months ago
john-hewitt / model-editing-canonical-examples
View on GitHub
☆14Feb 12, 2024Updated 2 years ago
LeapLabTHU / limit-of-RLVR
View on GitHub
repo for paper https://arxiv.org/abs/2504.13837
☆344Dec 17, 2025Updated 7 months ago
SalesforceAIResearch / UserRL
View on GitHub
The raw UserRL repo under construction
☆111Jun 2, 2026Updated last month
liushulinle / MarsRL
View on GitHub
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
☆18Nov 18, 2025Updated 8 months ago
DwanZhang-AI / SePPO
View on GitHub
Code for "SePPO: Semi-Policy Preference Optimization for Diffusion Alignment."
☆18Oct 7, 2024Updated last year
lilakk / BLEUBERI
View on GitHub
Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"
☆32Jun 5, 2025Updated last year
zengyan-97 / MultiT-C-Dialog
View on GitHub
A multi-task learning approach for conditioned response generation (NAACL 2021)
☆12Nov 18, 2022Updated 3 years ago
TsinghuaC3I / Unify-Post-Training
View on GitHub
Towards a Unified View of Large Language Model Post-Training
☆211Sep 8, 2025Updated 10 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
complex-reasoning / RPG
View on GitHub
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)
☆76Jun 29, 2026Updated 3 weeks ago
ZJU-REAL / cooper
View on GitHub
☆29Aug 19, 2025Updated 11 months ago
SkyworkAI / Skywork-Reward-V2
View on GitHub
Scaling Preference Data Curation via Human-AI Synergy
☆151Jul 3, 2025Updated last year
lasgroup / SDPO
View on GitHub
Reinforcement Learning via Self-Distillation (SDPO)
☆1,017Jul 1, 2026Updated 2 weeks ago
jacobfa / mot
View on GitHub
☆15Sep 25, 2025Updated 9 months ago
wutaiqiang / LLM_KD_AKL
View on GitHub
☆22Oct 22, 2024Updated last year
ypwang61 / One-Shot-RLVR
View on GitHub
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆444Mar 11, 2026Updated 4 months ago