RLHFlow / Online-DPO-R1

Codebase for Iterative DPO Using Rule-based Rewards
227Updated last month

Alternatives and similar repositories for Online-DPO-R1:

Users that are interested in Online-DPO-R1 are comparing it to the libraries listed below