CyberAgentAILab / filtered-dpo

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model
11Updated 3 months ago

Alternatives and similar repositories for filtered-dpo:

Users that are interested in filtered-dpo are comparing it to the libraries listed below