uclaml / SPPO

The official implementation of Self-Play Preference Optimization (SPPO)
498Updated 3 months ago

Related projects

Alternatives and complementary repositories for SPPO