uclaml / SPPOLinks
The official implementation of Self-Play Preference Optimization (SPPO)
☆565Updated 4 months ago
Alternatives and similar repositories for SPPO
Users that are interested in SPPO are comparing it to the libraries listed below
Sorting:
- Codebase for Iterative DPO Using Rule-based Rewards☆246Updated last month
- Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS