CyberAgentAILab / filtered-dpoLinks

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model
15Updated 6 months ago

Alternatives and similar repositories for filtered-dpo

Users that are interested in filtered-dpo are comparing it to the libraries listed below

Sorting: