RLHFlow / RLHF-Reward-ModelingLinks
Recipes to train reward model for RLHF.
☆1,455Updated 5 months ago
Alternatives and similar repositories for RLHF-Reward-Modeling
Users that are interested in RLHF-Reward-Modeling are comparing it to the libraries listed below
Sorting:
- A recipe for online RLHF and online iterative DPO.☆529Updated 9 months ago
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward☆923Updated 7 months ago