RLHFlow / Online-DPO-R1

Codebase for Iterative DPO Using Rule-based Rewards
243Updated last month

Alternatives and similar repositories for Online-DPO-R1

Users that are interested in Online-DPO-R1 are comparing it to the libraries listed below

Sorting: