dvlab-research / Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
285Updated 4 months ago

Related projects

Alternatives and complementary repositories for Step-DPO