zwhong714 / PSFT
View external linksLinks

[ICLR 2026] PSFT is a trust-region–inspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, constraining policy drift to stabilize training and improve generalization.
35Sep 9, 2025Updated 5 months ago

Alternatives and similar repositories for PSFT

Users that are interested in PSFT are comparing it to the libraries listed below

Sorting:

Are these results useful?