princeton-nlp / unintentional-unalignment

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
13Updated last month

Related projects

Alternatives and complementary repositories for unintentional-unalignment