TianduoWang / DPO-ST

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
30Updated 3 months ago

Related projects

Alternatives and complementary repositories for DPO-ST