Asap7772 / understanding-rlhf
View external linksLinks

Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy sampling or negative gradients outperform offline, maximum likelihood objectives.
32Apr 20, 2024Updated last year

Alternatives and similar repositories for understanding-rlhf

Users that are interested in understanding-rlhf are comparing it to the libraries listed below

Sorting:

Are these results useful?