InternLM/POLAR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/InternLM/POLAR)

InternLM / POLAR

Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.

☆166

Alternatives and similar repositories for POLAR

Users that are interested in POLAR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenMOSS / Thus-Spake-Long-Context-LLM
View on GitHub
a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation
☆62Mar 31, 2025Updated last year
InternLM / InternBootcamp
View on GitHub
Official implement on InternBootCamp
☆348Updated this week
SkyworkAI / Skywork-Reward-V2
View on GitHub
Scaling Preference Data Curation via Human-AI Synergy
☆152Jul 3, 2025Updated last year
InternLM / Intern-S1
View on GitHub
A Scientific Multimodal Foundation Model
☆838Updated this week
Kwai-Klear / RLEP
View on GitHub
RL with Experience Replay
☆59Jul 27, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
InternLM / Kernel-Smith
View on GitHub
☆26Mar 31, 2026Updated 3 months ago
choosewhatulike / cluster-clip
View on GitHub
Multi-GPU supported kmeans clustering for cluser-clip
☆15Jun 3, 2024Updated 2 years ago
InternLM / OREAL
View on GitHub
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆190Mar 20, 2025Updated last year
opendatalab / REST
View on GitHub
☆34Jul 15, 2025Updated last year
Zhou-Zoey / RMB-Reward-Model-Benchmark
View on GitHub
☆48Mar 25, 2025Updated last year
ruixin31 / Spurious_Rewards
View on GitHub
☆361Jul 29, 2025Updated 11 months ago
wizard-III / ArcherCodeR
View on GitHub
ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement …
☆44Aug 6, 2025Updated 11 months ago
WooooDyy / BAPO
View on GitHub
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping…
☆94Jan 29, 2026Updated 5 months ago
OpenBMB / RLPR
View on GitHub
Extrapolating RLVR to General Domains without Verifiers
☆205Aug 12, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
RM-R1-UIUC / RM-R1
View on GitHub
[ICLR'26] RM-R1: Unleashing the Reasoning Potential of Reward Models
☆167Jun 26, 2025Updated last year
Chengsong-Huang / Self-Calibration
View on GitHub
codes for Efficient Test-Time Scaling via Self-Calibration
☆20Sep 13, 2025Updated 10 months ago
sunjie279 / SimCT-
View on GitHub
☆21May 22, 2026Updated 2 months ago
yuleiqin / RAIF
View on GitHub
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆32Oct 9, 2025Updated 9 months ago
OpenLMLab / scaling-rope
View on GitHub
code for Scaling Laws of RoPE-based Extrapolation
☆73Oct 16, 2023Updated 2 years ago
pzs19 / LEMMA
View on GitHub
☆16Sep 4, 2025Updated 10 months ago
OpenMOSS / Embodied-Planner-R1
View on GitHub
Embodied-Planner-R1: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning
☆27Mar 30, 2026Updated 3 months ago
LeapLabTHU / limit-of-RLVR
View on GitHub
repo for paper https://arxiv.org/abs/2504.13837
☆345Dec 17, 2025Updated 7 months ago
WisdomShell / RewardAnything
View on GitHub
RewardAnything: Generalizable Principle-Following Reward Models
☆44Jun 11, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
aster2024 / SWIFT
View on GitHub
Source code for SWIFT, an efficient reward model.
☆21Jan 13, 2026Updated 6 months ago
RUCAIBox / Passk_Training
View on GitHub
The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''
☆113Aug 15, 2025Updated 11 months ago
ChenxinAn-fdu / POLARIS
View on GitHub
Scaling RL on advanced reasoning models
☆691Oct 20, 2025Updated 9 months ago
suu990901 / KlearReasoner
View on GitHub
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
☆82Dec 25, 2025Updated 6 months ago
open-compass / ANAH
View on GitHub
[ACL 2024] ANAH & [NeurIPS 2024] ANAH-v2 & [ICLR 2025] Mask-DPO
☆66Apr 30, 2025Updated last year
callsys / GMPO
View on GitHub
[ICLR 2026] Geometric-Mean Policy Optimization
☆104Jan 26, 2026Updated 5 months ago
RyanLiu112 / GenPRM
View on GitHub
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆102Nov 8, 2025Updated 8 months ago
langfengQ / verl-agent
View on GitHub
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…
☆2,143Jun 9, 2026Updated last month
ypwang61 / One-Shot-RLVR
View on GitHub
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆444Mar 11, 2026Updated 4 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
CodeGoat24 / UnifiedReward
View on GitHub
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex
☆796Jun 18, 2026Updated last month
liushulinle / UloRL
View on GitHub
An Ultra-Long Output Reinforcement Learning Approach
☆23Jul 31, 2025Updated 11 months ago
TianHongZXY / RLVR-Decomposed
View on GitHub
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆165Mar 2, 2026Updated 4 months ago
sail-sg / understand-r1-zero
View on GitHub
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,268Aug 27, 2025Updated 10 months ago
alibaba / ROLL
View on GitHub
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
☆3,316Updated this week
AndreHe02 / rewarding-unlikely-release
View on GitHub
☆15Jun 10, 2025Updated last year
mlwu22 / RED
View on GitHub
Implementation code for ACL2024：Advancing Parameter Efficiency in Fine-tuning via Representation Editing
☆15Apr 20, 2024Updated 2 years ago