zwhong714 / PSFTLinks

PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining policy drift to stabilize training and improve generalization.
25Updated last week

Alternatives and similar repositories for PSFT

Users that are interested in PSFT are comparing it to the libraries listed below

Sorting: