yinyueqin / DenseRewardRLHF-PPOView on GitHub
This repository contains the code and released models for the paper Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model, accepted at TMLR.
19Jan 8, 2025Updated last year

Alternatives and similar repositories for DenseRewardRLHF-PPO

Users that are interested in DenseRewardRLHF-PPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?