RLHFlow / Online-DPO-R1View on GitHub
Codebase for Iterative DPO Using Rule-based Rewards
269Apr 11, 2025Updated 10 months ago

Alternatives and similar repositories for Online-DPO-R1

Users that are interested in Online-DPO-R1 are comparing it to the libraries listed below

Sorting:

Are these results useful?