cnsdqd-dyb / Guide-GRPO

Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, based on DeepSeekRL-Extended.
25Updated last month

Alternatives and similar repositories for Guide-GRPO:

Users that are interested in Guide-GRPO are comparing it to the libraries listed below