cnsdqd-dyb / Guide-GRPOView on GitHub
Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, based on DeepSeekRL-Extended.
29Feb 23, 2025Updated last year

Alternatives and similar repositories for Guide-GRPO

Users that are interested in Guide-GRPO are comparing it to the libraries listed below

Sorting:

Are these results useful?