TianduoWang / DPO-ST

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
41Updated 7 months ago

Alternatives and similar repositories for DPO-ST:

Users that are interested in DPO-ST are comparing it to the libraries listed below