PRIME-RL / Entropy-Mechanism-of-RLLinks

The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.

☆403

Alternatives and similar repositories for Entropy-Mechanism-of-RL

Users that are interested in Entropy-Mechanism-of-RL are comparing it to the libraries listed below

Sorting:

ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆392Updated 2 months ago
CJReinforce / PURE
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆148Updated 2 months ago
GAIR-NLP / ToRL
☆321Updated 7 months ago
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆341Updated 3 months ago
QingyangZhang / Label-Free-RLVR
☆294Updated 5 months ago
RyanLiu112 / Awesome-Process-Reward-Models
A comprehensive collection of process reward models.
☆130Updated 2 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆149Updated 10 months ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆258Updated 7 months ago
GAIR-NLP / LIMR
☆213Updated 10 months ago
TsinghuaC3I / Unify-Post-Training
Towards a Unified View of Large Language Model Post-Training
☆195Updated 3 months ago
XiaoYee / Awesome_Efficient_LRM_Reasoning
😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyond
☆321Updated 2 months ago
WooooDyy / MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆56Updated last year
LeapLabTHU / limit-of-RLVR
repo for paper https://arxiv.org/abs/2504.13837
☆301Updated last week
TsinghuaC3I / MARTI
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
☆377Updated last month
kanishkg / cognitive-behaviors
☆218Updated 8 months ago
RyanLiu112 / GenPRM
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆92Updated last month
ruixin31 / Spurious_Rewards
☆346Updated 4 months ago
OpenBMB / RLPR
Extrapolating RLVR to General Domains without Verifiers
☆184Updated 4 months ago
EIT-NLP / Awesome-Latent-CoT
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
☆242Updated this week
multimodal-art-projection / LatentCoT-Horizon
📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.
☆317Updated last month
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆168Updated 9 months ago
TIGER-AI-Lab / verl-tool
A version of verl to support diverse tool use
☆766Updated 2 weeks ago
Joshua-Ren / Learning_dynamics_LLM
☆189Updated 7 months ago
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆199Updated 2 years ago
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆326Updated last year
Zanette-Labs / efficient-reasoning
☆72Updated 8 months ago
Blueyee / Efficient-CoT-LRMs
Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!
☆71Updated 8 months ago
JerryWu-code / TinyZero
Deepseek R1 zero tiny version own reproduce on two A100s.
☆80Updated 10 months ago
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆114Updated 4 months ago
qiancheng0 / ToolRL
☆402Updated 2 months ago