TianduoWang / DPO-STLinks
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
☆52Updated last year
Alternatives and similar repositories for DPO-ST
Users that are interested in DPO-ST are comparing it to the libraries listed below
Sorting: