haohaoXhang / RLHF_learnLinks

这是一个从零开始构建的强化学习人类反馈(RLHF)学习代码库,实现了 PPO、GRPO、GSPO 以及相关的策略优化算法,并提供了清晰、可复现的训练流程。由于文档是由latex文件转译过来,如果md文件渲染异常,请用VScode的md插件打开
30Updated last week

Alternatives and similar repositories for RLHF_learn

Users that are interested in RLHF_learn are comparing it to the libraries listed below

Sorting: