CyberAgentAILab / filtered-dpo

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model
14Updated 5 months ago

Alternatives and similar repositories for filtered-dpo

Users that are interested in filtered-dpo are comparing it to the libraries listed below

Sorting: