Asap7772 / understanding-rlhf

Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Our new work finds that approaches employing on-policy sampling or negative gradients outperform offline, maximum likelihood objectives.
28Updated last year

Alternatives and similar repositories for understanding-rlhf:

Users that are interested in understanding-rlhf are comparing it to the libraries listed below