TianduoWang / DPO-STView on GitHub
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
53Jul 28, 2024Updated last year

Alternatives and similar repositories for DPO-ST

Users that are interested in DPO-ST are comparing it to the libraries listed below

Sorting:

Are these results useful?