TianduoWang / DPO-STLinks

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
49Updated last year

Alternatives and similar repositories for DPO-ST

Users that are interested in DPO-ST are comparing it to the libraries listed below

Sorting: