LeapLabTHU / JustGRPOLinks

Minimalist RL for Diffusion LLMs with SOTA reasoning performance (89.1% GSM8K). Official implementation of "The Flexibility Trap".
75Updated this week

Alternatives and similar repositories for JustGRPO

Users that are interested in JustGRPO are comparing it to the libraries listed below

Sorting: