dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
329Updated 6 months ago

Alternatives and similar repositories for Step-DPO:

Users that are interested in Step-DPO are comparing it to the libraries listed below