dvlab-research / Step-DPOLinks

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
368Updated 4 months ago

Alternatives and similar repositories for Step-DPO

Users that are interested in Step-DPO are comparing it to the libraries listed below

Sorting: