princeton-nlp / SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward
640Updated 3 weeks ago

Related projects: