princeton-nlp / unintentional-unalignment

[ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
19Updated 3 weeks ago

Alternatives and similar repositories for unintentional-unalignment:

Users that are interested in unintentional-unalignment are comparing it to the libraries listed below