Asap7772 / understanding-rlhfLinks

Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy sampling or negative gradients outperform offline, maximum likelihood objectives.
29Updated last year

Alternatives and similar repositories for understanding-rlhf

Users that are interested in understanding-rlhf are comparing it to the libraries listed below

Sorting: